Blackboard Perspectives and thoughts from Zhenhua Yao

Layman's Quick Guide on Crashdump Debugging

In distributed computing, we rely on traces and metrics to understand the runtime behavior of programs. However, in some cases we still need assistance from debuggers for live-site issues. For instance, if the service crashes all of sudden and no trace offers any clue, we need to load crashdump into debugger. Or some exception is raised but traces are insufficient to understand the nature of the problem, we may need to capture a full state of the process.

In old days, at least starting in Windows 3.1, there was a Dr. Watson to collect the error information following a process crash, mainly the crash dump file. Every time I saw it, something bad happened. Nowadays it has been under the new name of Windows Error Reporting, or WER. Inside the platform, there is still a “watson” service to collect all the crashdumps created by the platform code, process it, assign to the right owner, and send alerts as configured. Some times during live-site investigation, we can also request a dump file collection using “Node Diagnostiics”, then the file will be taken over by Watson (assuming your hand isn’t fast enough to move the file somewhere else).

Like it or not, to look at the dump file you have to use windbg. You can choose cdb or windbgx but they are not really different. If you are too busy to learn how windbg works, particularly managed code debugging using SOS, then you may use this quick guide to save some time.

Debugger extensions

Download sosex from Steve’s TechSpot and save the DLL in the extension directory.

Download mex from Microsoft download and save the DLL in the extension directory.

To find the extension directory, find the directory at where windbg.exe is located using Task Manager, then go to winext directory.

Basic commands

Exit windbg: enter qd or simply Alt-F4.

Display process environment block

!peb

Wou will see where the execution image is, all the environment variables which contains the machine name, processor ID, count, etc.

CPU usage

To check which threads have consumed how much CPU time:

!runaway

To check CPU utilization, thread pool worker thread and completion port thread usage:

!threadpool

List of threads: check if how many threads there are, any threads are terminated or hitting some exception, etc.

!threads

If you click the blue underlined link you can switch to that thread, then use the following to see the native stack trace:

k

or see the managed stack trace

!clrstack

To check the object on the stack run the following:

!dso

To check the local variables of a specific frame (use the frame number in “k” output):

!mdv [FrameNumber]

Object count: to get the statistics of objects in the managed heap.

!dumpheap -stat

If you want to get the live objects (the objects that cannot be garbage collected), add -live parameter. If you want to get the dead object, add -dead parameter.

Find object by type name: firstly find the list of types with statistics by the type name (either full name of partial):

!dumpheap -stat -type MyClassName

Then click the method table link, which is essentially:

!dumpheap /d -mt [MethodTableAddress]

You can click the address link to dump the object, or

!do [ObjectAddress]

A better way to browse the object properties is to use sosex:

!sosex.mdt [ObjectAddress]

To know why it’s live, or the GC root:

!gcroot [ObjectAddress]

or use sosex

!sosex.mroot [ObjectAddress]

Symbols

Check the current symbol path, you use use menu or

.sympath

Add a directory where PDB files (symbols) are located, use menu or

.sympath+ \\mynetworkshare\directory\symbols

Find all the class names and properties with a particular string (use your own wildcard string):

!sosex.mx *NetworkManager

List of all modules loaded:

lm

To get the details about a module, click the link in above output or:

lmDvm MyNameSpace_MyModule

Here you can see the file version, product version string, timestamp, etc. For the files from many repos, you can see the branch name and commit hash. If you are interested in the module info:

!lmi MyNameSpace_MyModule

To show disassembled IL code, firstly switch to a managed frame, then run mu:

!sosex.mframe 70
!sosex.mu

Advanced

Find unique stack traces: this will go through the stack trace of all threads, group them by identical ones, and show you which stack has shown up how many times:

!mex.us

Often times you can see lock contentions or slow transaction isuse, etc.

Find all exceptions:

!mex.dae

Dump all async task objects:

!mex.tasks

If you have to debug memory related issue, refer to my previous post.

Further reading

Many debugging topics are not covered, for instance finalization, deadlock, locking, etc. If quick guidance is insufficient, please spend some time starting from Getting Started With Windows Debugging or the book Advanced .NET Debugging.