Server 2008 BSOD about once a week

bsoddumphyper-vsymantec-endpoint-protectionwindows-server-2008

I'm in quite a bind here, hopefully someone can help.

Here's what I have:
Dell R710 with one Xeon 2.7 GHz processor, 18GB Ram, Server 2008 x64 SP2
I'm running HyperV with about 5 servers.

Starting in Jan I've had problems with crashing.

First time it was one of the VMs (a Server 2003 SBS). It crashed with no error entries in the Event Log and no Crash Dump. The server came back on by itself.

Then twice the Host server (the 2008 server) crashed last week, and then today, about a week later. Again, no entries in the event log, no crash dump, it came on again by itself.

I had made changes to the server in the beginning of January. I updated the Network Drivers (Broadcom) and added the Teaming Software, and teamed two interfaces. I also upgraded my Symantec Endpoint Protection on all of the Servers to the latest version 12. I also replaced the switch, but not counting that as part of the problem.

I was thinking that this is a memory problem because one of the VMs crashed as well as the host. But it could also be the Symantec.

I don't have all of the crash DUMPS because the idiot that configured the server didn't leave enough room on the system drive to copy the DMP files

Here is one of the DMP files:

Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\Minidump\Mini012412-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: SRV*e:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows Server 2008/Windows Vista Kernel Version 6002 (Service Pack 2) MP (8 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 6002.18484.amd64fre.vistasp2_gdr.110617-0336
Machine Name:
Kernel base = 0xfffff800`01c1d000 PsLoadedModuleList = 0xfffff800`01de1dd0
Debug session time: Tue Jan 24 18:58:02.334 2012 (UTC - 5:00)
System Uptime: 9 days 13:32:35.727
Loading Kernel Symbols
...............................................................
................................................................
.............................
Loading User Symbols
Loading unloaded module list
..................................................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 7F, {8, 80050033, 6f8, fffff80001c70da4}

Probably caused by : NETIO.SYS ( NETIO!MatchValues+14e )

Followup: MachineOwner
---------

I have since disabled the Teaming

Here is another:

Windows Server 2008/Windows Vista Kernel Version 6002 (Service Pack 2) MP (8 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 6002.18484.amd64fre.vistasp2_gdr.110617-0336
Machine Name:
Kernel base = 0xfffff800`01c4b000 PsLoadedModuleList = 0xfffff800`01e0fdd0
Debug session time: Sat Jan 28 07:42:48.945 2012 (UTC - 5:00)
System Uptime: 0 days 21:36:52.143
Loading Kernel Symbols
...............................................................
................................................................
.............................
Loading User Symbols
Loading unloaded module list
...........
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 7F, {8, 80050033, 6f8, fffff80001ceeaa2}

Probably caused by : ntkrnlmp.exe ( nt!KiDoubleFaultAbort+b8 )

Followup: MachineOwner
---------

3: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050033
Arg3: 00000000000006f8
Arg4: fffff80001ceeaa2

Debugging Details:
------------------


USER_LCID_STR:  ENU

OS_SKU:  7

BUGCHECK_STR:  0x7f_8

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

PROCESS_NAME:  System

CURRENT_IRQL:  d

LAST_CONTROL_TRANSFER:  from fffff80001ca522e to fffff80001ca5490

STACK_TEXT:  
fffffa60`019e9a68 fffff800`01ca522e : 00000000`0000007f 00000000`00000008 00000000`80050033 00000000`000006f8 : nt!KeBugCheckEx
fffffa60`019e9a70 fffff800`01ca3a78 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x6e
fffffa60`019e9bb0 fffff800`01ceeaa2 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiDoubleFaultAbort+0xb8
fffffa60`005a8000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HvlEndSystemInterrupt+0x2


STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt!KiDoubleFaultAbort+b8
fffff800`01ca3a78 90              nop

SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  nt!KiDoubleFaultAbort+b8

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  4dfb5a33

FAILURE_BUCKET_ID:  X64_0x7f_8_nt!KiDoubleFaultAbort+b8

BUCKET_ID:  X64_0x7f_8_nt!KiDoubleFaultAbort+b8

Followup: MachineOwner
---------

Hopefully I can get some much needed guidance here.

Thanks

Best Answer

When it comes to BSOD, 99% it's a driver problem.

You can change config to only store a kernel dump instead of a full ram one, so you can keep more.

What i would do:

  1. Upgrade Broadcom driver. I know you say you did it, but check again, and from broadcom, not Dell. There are always 6 monthes lates.
  2. Check for settings on the netword card, like receive and send buffer. Reset to factory in doubt
  3. Disable temporary symantec to check. Also ensure your endpoint driver is up to date. I already saw update of symantec that left an older driver version.