Thursday, March 12, 2015

Understanding Exception 13 and Exception 14 purple diagnostic screen events in ESX 3.x/4.x and ESXi 3.x/4.x/5.x

Understanding Exception 13 and Exception 14 purple diagnostic screen events in ESX 3.x/4.x and ESXi 3.x/4.x/5.x (1020181)


Symptoms

You experience purple diagnostic screens that contain information similar to:
  • [VMware ESX [Releasebuild-164009 X86_64
    #GP Exception(13) in world 4130:helper13-0 @ 0x41803399e303
  • [VMware ESX Server [Releasebuild-123630]
    #PF Exception type 14 in world 1024:console @ 0x67f0ae

Purpose

Notes:

Resolution

Overview

Operating systems manage the physical memory on a system by employing several methods:
  • Virtual memory or paging is designed to abstract the physical memory into virtual memory. This abstraction allows the operating system to allocate memory specific to programs and allows for other forms of memory management, including Memory Swapping, Shared Memory, and Memory Protection.

  • Memory Swapping occurs when operating systems optimize memory by moving data that is not being used to slower mediums and vice versa.

  • Shared Memory is a method commonly used if multiple programs need to communicate with each other. Shared memory allows multiple programs to access the same page of memory.

  • Memory Protection prevents a malicious or malfunctioning program from accessing memory pages from other programs.
When a critical application has difficulty accessing memory, it generally manifests with an error involving one of these memory management operations.

Exception 13: General Protection Fault

A general protection fault (Exception 13) occurs under one of these circumstances:
  • The page being requested does not belong to the program requesting it (and is not mapped in program memory)
  • The program does not have rights to perform a read or write operation on the page
Operating systems maintain a page table that include flags to mark pages as protected. If there is a conflict between the operation and the flag, the operating system traps the illegal request.
Note: Segmentation faults are very similar to general protection faults.
This is a sample of a General Protection Fault generated by ESX:
[VMware ESX [Releasebuild-164009 X86_64
#GP Exception(13) in world 4130:helper13-0 @ 0x41803399e303
frame=0x4100c0117d78 ip=0x41803399e303 cr2=0x0 cr3=0xcff94000
err=0 rflags=0x10246 cr4=0x16c
rax=0x0 rbx=0x417ff492dbe0 rcx=0x417ff386cc80
rdx=0x4100c0117f00 rbp=0x4100c0117f40 rsi=0x4100c0117e30
rdi=0x410008c46220 r8=0x4100c0117e30 r9=0x4100c0117d50
r10=0x3713e1b91ddd3 r11=0x41803399e1fc r12=0x4100c004fde0
r13=0x410008c46220 r14=0x4100c0117e30 r15=0x417ff3614660
0:4096/console *1:4130/helper13- 2:4098/idle2 3:4099/idle3
@BlueScreen: #GP Exception(13) in world 4130:helper13-0 @ 0x41803399e303
Code starts at 0x418033600000
0x4100c0117f40:[0x41803399e303]GetDriverInfo+0x106 stack: 0x410002086ba8
0x4100c0117f80:[0x4180336d9ef3]UplinkProcessAsyncCallsHelperCB+0x126 stack: 0x0
0x4100c0117ff0:[0x418033663670]helpFunc+0x4f7 stack: 0x0
0x4100c0117ff8:[0x0]Unknown stack: 0x0
VMK uptime: 5:13:53:05.627 TSC: 968936502031893
VMK checksum BAD: 0x3ee854ad7f0856e5 0x7009aad95a9042d9
FSbase (0x0) GSbase (0x0) kernelGSbase (0x0)

The Exception 13 General Protection Fault may be caused by either a hardware or a software issue. As the cause may vary significantly for these types of exceptions, a core-dump review may be performed by VMware. This process is usually not possible to perform without access to protected source code and analysis tools or processes. Collect diagnostic information from the VMware ESX host and submit a support request. For more information, see Collecting diagnostic information for VMware products (1008524) and How to Submit a Support Request. You can also contact your hardware vendor if you or VMware Technical Support are able to determine that a particular driver module or device has caused the exception.

Exception 14: Page Fault

A page fault (Exception 14) occurs when the page being requested has not been successfully loaded into memory. There are both healthy and unhealthy page faults:
  • A healthy page fault results in the page being loaded from swapped memory to physical memory. The program is then allowed to proceed after the data has been properly loaded into physical memory.
  • An unhealthy page fault occurs when the page is not loaded in memory, and the operating system is unable to load the page from swapped to physical memory.
This is a sample of a Page Fault generated by ESX:
[VMware ESX Server [Releasebuild-123630]Exception type 14 in world 1024:console @ 0x67f0ae
frame=0x1402824 ip=0x67f0ae cr2=0x405f6000 cr3=0x13401000 cr4=0x6f0
es=0x4028 ds=0x40404028 fs=0xffff0000 gs=0x0
eax=0x409f6000 ebx=0x1000 ecx=0x400 edx=0x409f6000
ebp=0x14028b4 esi=0x407c8000 edi=0x409f6000 err=11 eflags=0x10206
*0:1024/console 1:1092/mks:ubunt 2:1089/vmware-vm 3:1027/idle3
4:1028/idle4 5:1029/idle5 6:1030/idle6 7:1091/vmware-vm
8:1032/idle8 9:1033/idle9 10:1034/idle10 11:1093/vcpu-0:ub
12:1036/idle12 13:1037/idle13 14:1038/idle14 15:1039/idle15
@BlueScreen: Exception type 14 in world 1024:console @ 0x67f0ae
0x14028b4:[0x67f0ae]genericCopy+0x155 stack: 0xc0bbc60, 0x40081800, 0x0
0x14028dc:[0x67f3d6]vmk_SgCopy+0x41 stack: 0xc0bbc60, 0x40081800, 0x0
0x140292c:[0x7cef13]SCSICompleteFragment+0x1ae stack: 0xc005d00, 0x0, 0xc3100
0x14029c4:[0x7d081c]SCSICompletePathCommand+0x453 stack: 0xc005d00, 0x125, 0x148a4f8
0x1402a60:[0x7cafff]SCSICompleteAdapterCommand+0x3da stack: 0xc005d00, 0x2, 0x1402de0
0x1402ac0:[0x88343f]vmk_scsi_dump_active+0x20e stack: 0x0, 0x10a, 0x6a525f0
0x1402b30:[0x61811e]BHCallHandlersInt+0xf5 stack: 0x2ad0, 0x0, 0x1402b88
0x1402b88:[0x618614]BH_Check+0x2bb stack: 0x1, 0x1402bac, 0x1752d49
0x1402bac:[0x61fb8e]IDT_HandleInterrupt+0x85 stack: 0x1402bf8, 0x0, 0xb638000
0x1402bc0:[0x61fcb5]IDT_IntrHandler+0x4c stack: 0x1402bf8, 0x4028, 0x1454028
0x1402c70:[0x692c6c]CommonIntr+0xb stack: 0x1489500, 0x0, 0x1402de0
0x1402e1c:[0x7615e4]CpuSchedDispatch+0x487 stack: 0x2390a60, 0x1489500, 0x0
0x1402e88:[0x763eaa]CpuSchedDoWaitDirectedYield+0x351 stack: 0x0, 0x1f55e60, 0x0
0x1402ea4:[0x763fda]CpuSched_WaitIRQ+0x31 stack: 0xfedcba90, 0x6, 0x1f55e60
0x1402ec4:[0x69197f]VMNIXVMKSyscall_Idle+0xe2 stack: 0x1402f6c, 0x6915cf, 0x0
0x1402ecc:[0x68669c]VMNIXVMKSyscallUnpackIdle+0x7 stack: 0x0, 0x0, 0x0
0x1402f6c:[0x6915cf]HostSyscall+0xf6 stack: 0x1402fbc, 0xc03d9f98, 0x1c
0x1402fe8:[0x6909e3]HostVMKEntry+0xce stack: 0x0, 0x0, 0x0
VMK uptime: 0:01:58:34.004 TSC: 15137542595232
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1... using slot 1 of 1... log
 
The Exception 14 Page Fault may be caused by either a hardware or a software issue. As the cause may vary significantly for these types of exceptions, a core-dump review may be performed by VMware. This process is usually not possible to perform without access to protected source code and analysis tools or processes. Collect diagnostic information from the VMware ESX host and submit a support request. For more information, see Collecting diagnostic information for VMware products (1008524) and How to Submit a Support Request. You can also contact your hardware vendor if you or VMware Technical Support are able to determine a particular driver module or device has caused the exception. To find our more about page-fault exceptions, see the Formats and Encodings of SSE2 Floating-Point Instructions table in the Intel 64 and IA-32 Architectures Software Developer’s Manual.

No comments:

Post a Comment