Russinovich Mark Russinovich Technical Fellow Microsoft Corporation SESSION CODE WCL315 About Me Technical Fellow Microsoft Cofounder and chief software architect of Winternals Software ID: 565074
Download Presentation The PPT/PDF document "The Case of the Unexplained, 2010: Troub..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Case of the Unexplained, 2010: Troubleshooting with Mark Russinovich
Mark RussinovichTechnical FellowMicrosoft Corporation
SESSION CODE: WCL315Slide2
About Me
Technical Fellow, Microsoft
Co-founder and chief software
architect of Winternals Software
Co-author of Windows Internals 4th and 5th edition and Inside Windows 2000 3rd edition with David SolomonAuthor of TechNet SysinternalsHome of blog and forumsContributing Editor TechNet Magazine, Windows IT Pro MagazinePh.D. in Computer EngineeringSlide3
Outline
Introduction
Sluggish Performance
Application Hangs
Error Messages
Application CrashesBlue ScreensSlide4
Case of the Unexplained…
This is the 2010 version of the “case of the unexplained” talk seriesPrevious versions covered different cases
Can view webcast on Sysinternals->Mark’s webcasts
Based on real case studies
Some of these have been written up on my blogSlide5
Troubleshooting
Most applications do a poor job of reporting unexpected errorsLocked, missing or corrupt filesMissing or corrupt registry data
Permissions problems
Errors manifest in several different ways
Misleading error messagesCrashes or hangsSlide6Slide7
Purpose of Talk
Show you how to solve these classes of problems by peering beneath the surfaceInterpreting process, file and registry activityInterpreting call stacks
You’ll learn tools and techniques to help you solve seemingly unsolvable problemsSlide8
Tools We’ll Use
Sysinternals: www.microsoft.com/technet/sysinternals Process Explorer – process/thread viewer
Process Monitor – file/registry/process/thread tracing
Autoruns
– displays all autostart locationsSigCheck – shows file version information PsExec – execute processes remotely or in the system accountTcpView – shows TCP/IP endpointsStrings – dumps printable strings in any fileADInsight – real time LDAP (Active Directory) monitorZoomit – presentation tool I’m usingMicrosoft downloads:Kernrate – sample-based system profiler
Visual Studio: Spy++ - Window analysis utility Debugging Tools for Windows: Windbg application and kernel debugger: www.microsoft.com/whdc/devtools/debugging/WindbgSlide9
Outline
Sluggish Performance
Application Hangs
Error Messages
Application CrashesBlue ScreensSlide10
Process Explorer
Process Explorer is a Task Manager replacementYou can literally replace Task Manager with Options->Replace Task Manager
Hide-when-minimize to always have it handy
Hover the mouse to see a tooltip showing the process consuming the most CPU
Open System Information graph to see CPU usage historyGraphs are time stamped with hover showing biggest consumer at point in timeAlso includes other activity such as I/O, kernel memory limitsSlide11
The Case of the Wmiprvse.exe CPU Hog
Customer periodically saw Wmiprsve.exe consuming excessive amounts of CPU:
Wmiprsve
is a hosting process for WMI providers so had to look deeper to find causeSlide12
Processes and Threads
A process represents an instance of a running programAddress space
Resources (e.g., open handles)
Security profile (token)
A thread is an execution context within a processUnit of scheduling (threads run, processes don’t run)All threads in a process share the same per-process address spaceThe System process is the default home for kernel mode system threadsFunctions in OS and some drivers that need to run as real threadsE.g., need to run concurrently with other system activity, wait on timers, perform background “housekeeping” workOther multi-host processes: svchost, iexplore, mmc, dllhostSlide13
Viewing Threads
Task Manager doesn’t show thread details within a processProcess Explorer does on “Threads” tab
Displays thread details such as ID, CPU usage, start time, state, priority
Start address is where the thread began running (not where it is now)
Click Module to get details on module containing thread start addressSlide14
Thread Start Functions and Symbol Information
Process Explorer can map the addresses within a module to the names of functions
This can help identify which component within a process is responsible for CPU usage
Configure Process
Explorer’s symbol engine:Download the latest Debugging Tools for Windows from Microsoft (free)Use dbghelp.dll from the Debugging ToolsPoint at the Microsoft public symbol server (or internal symbol server if you have access)Slide15
The Case of the Wmiprvse.exe CPU Hog (
Cont)
Thread list pointed at thread with generic start address:
Had to look deeper…Slide16
Call Stacks
Sometimes a thread start address doesn’t tell you what a thread is doingThe stack might provide a hint:
The stack is a per-thread region of memory that records a history of function nesting
The bottom from (Function 3) is where the thread will continue executing
Function 2Function 1
Function 3Slide17
Viewing Call Stacks
Click Stack on the Threads tab to view a thread’s call stack
Lists functions in reverse chronological order
Note that start address on Threads tab is different than first function shown in stack
This is because all threads created by Windows programs start in a library function in Kernel32.dll which calls the programmed start addressSlide18
The Case of the Wmiprvse.exe CPU Hog: Solved
Thread stack implicated AssetAdvisor.dll:Web search led to this KB article:
Article had hotfix for SMS 2003: problem solvedSlide19
The Case of the Runaway CPU
User noticed that system was sluggishRan Process Explorer and saw that System process was consuming CPU:Slide20
The Case of the Runaway CPU (Cont
)Looked at threads tab and saw thread from ALCXWDM driver causing the CPU usage:Slide21
The Case of the Runaway CPU: Solved
Double-clicked to look at version and s
aw it was
Realtek
driverWent to Realtek site and downloaded newer version: problem solved:Slide22
Outline
Sluggish Performance
Application Hangs
Error Messages
Application CrashesBlue ScreensSlide23
Process Monitor
Process Monitor is a real-time file, registry, process and thread monitorIt requires Windows 2000 SP4 w/Update Rollup 1, XP SP2 or higher, Server 2003 SP1 or higher, Vista and higher, or Server 2008 (including 64-bit versions of Windows) and higher
It replaces
Filemon
and Regmon, but you can use Filemon and Regmon on older operating systemsEnhancements over Filemon/Regmon include:More advanced filteringOperation call stacksBoot-time loggingData mining viewsProcess tree to see short-lived processesWhen in doubt, run Process Monitor!It will often show you the cause for error messages
It many times tells you what is causing sluggish performanceSlide24
The Case of the Slow Signed Application Start
User had an application that started quickly until they digitally signed itLaunch time went from seconds to a minute and a half
Asked user to captured Process Monitor trace
Saw multiple references to certificate revocation list (CRL) servers
Saw multiple references to proxy configurationSlide25
The Case of the Slow Signed Application Start: Solved
Asked user if system was connected to network: noSearched the web and learned that delays caused by .NET runtime signature verification
Could see .NET 2.0 framework loaded in log file
That triggered proxy server lookups
Solution: create a .config file that tells runtime to skip check<?xml version="1.0" encoding="utf-8"?><configuration> <runtime> <generatePublisherEvidence
enabled="false"/> </runtime>
</configuration>Slide26
Outline
Sluggish Performance
Application Hangs
Error Messages
Application CrashesBlue ScreensSlide27
The Case of the Failed SQL Reporting Services Attachment
Customer contacted Microsoft Support because sending an email subscription from SQL Reporting Services
(SRS) would
not attach the image
fileSupport spent 34 hours investigating:Had customer try on another identical SRS system: successTried to repro in house with same SRS DLL (Cdosys.dll) and on various OS’s, but unableFinally decided to capture Process Monitor trace from working and failing system to compareSlide28
The Case of the Failed SQL Reporting Services Attachment (
Cont)
Searched through traces for reference of CDO.Message.1 and started comparing
Working trace references a
CodePage key:Failing trace doesn’t:
Failing
WorkingSlide29
The Case of the Failed SQL Reporting Services Attachment: Solved
Opened HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet
\
Control\Nls\CodePage with “Jump to” on failing systemNoticed lots of missing values:Imported key from working system: problem solved
Failing
WorkingSlide30
The Case of the Blocked HTTP Port
User complained that they were unable to browse the web Got connection error from IE
Had just had system migrated between domains
Admin went about troubleshooting
Deleted IE cache: problem persistedChecked DNS, gateway, IP settings: no problemsTried other outbound ports: no problemsSlide31
The Case of the Blocked HTTP Port (Cont)
Suspected third-party plugin
so captured a Process Monitor trace while launching IE
Set a file system filter and looked at stack of each event
Got to event that accessed Software.log hive file:Slide32
The Case of the Blocked HTTP Port (Cont)
Web search revealed that driver was part of ZoneAlarms
stateful
firewallSearch also showed that Cisco VPN client uses it:Had uninstalled VPN client before moving system across domainsUninstall must have left something behind Slide33
Viewing Autostarts
Use Autoruns
to see what’s configured to start when the system boots and you login
Windows
MsConfig shows a subset defined autostart locationsMsConfig doesn’t show as much informationSlide34
The Case of the Blocked HTTP Port: Solved
Ran Autoruns and looked for driver:
Unchecked driver entry, rebooted and problem solvedSlide35
Outline
Sluggish Performance
Application Hangs
Error Messages
Application CrashesBlue ScreensSlide36
Application Crashes
In most cases, there’s nothing you can do about application crashes They are caused by a bug in in the programOnly the developer can fix a bug
However, the crash may be caused by
misconfiguration
or an extension (a plugin)Monitor the application’s crash with Process Monitor if it’s reproducibleLook for extensions in the crash file with WindbgSlide37
Finding the Crash Dump
On pre-Vista systems, finding the dump file is easy:Slide38
Attaching to the Dying Process
Vista and higher doesn’t save crash dumps for most crashesOnly if Microsoft requests a dump for study and you send it in
When a crash occurs, don’t dismiss the crash dialog:
Launch
Windbg and attach to the processYou can save a dump with the .dumpcommandSlide39
Identifying the Crashed Process
On Vista and higher, the process name might not be enough to identify the instance that’s crashed:
To determine the PID of the crashed instance, look at
WerFault’s
command line:Slide40
Enabling Dump Archiving on Vista and Higher
Or you can configure Vista and higher to always generate and save a dump fileCreate a key named:
HKLM\Software\Microsoft\Windows\Windows Error Reporting\
LocalDumps
Dumps go to %LOCALAPPDATA%\CrashDumpsOverride with a DumpFolder value (REG_EXPAND_SZ)Limit dump history with a DumpCount value (DWORD)Slide41
Analyzing a Crash
Basic crash dump analysis is easy and it might tell you the causeRequires Windbg
and symbol configuration
Once the dump is loaded, find the faulting thread
The debugger might identify itIf the debugger doesn’t, examine each thread stack looking for “fault”, “exception”, or “error” namesExamine the stack of the faulting thread to look for third-party pluginsIf you suspect an extension:Check for a new version Uninstall it if the problem persistsSlide42
The Case of the Media Foundation Crash
User tried to open a WMV file with Windows Media Player, but would get a crash:Slide43
The Case of the Media Foundation Crash (Cont)
Attached to process and did a !analyze –v:Slide44
The Case of the Media Foundation Crash: Solved
Did a Web search for “evr
monitor crash” and found a
hotfix
:User was using 5 monitorsApplied hotfix and problem solvedSlide45
Outline
Sluggish Performance
Application Hangs
Error Messages
Application CrashesBlue ScreensSlide46
Blue Screen Crashes
Windows has various components that run in Kernel Mode, the highest privilege mode of the OSOS components: Ntoskrnl.exe, Hal.dll
Drivers: Ntfs.sys, Tcpip.sys, device drivers
Kernel-mode components are privileged extensions to the OS have to adhere to various rules
Not accessing invalid memoryAccessing memory at the right “Interrupt Request Level”Not causing resource deadlocksWhen a kernel-mode component performs an illegal operation, Windows crashes (blue screens)Crashing helps preserve the integrity of user dataA resource deadlock can hang the systemSlide47
Online Crash Analysis
When you reboot after a crash, Windows offers to upload it to Microsoft Online Crash Analysis (OCA)Automated server generates a thumbprint of the crash and uses it as a key in a database
If the database has an entry, the user is told the cause and directed at a fixSlide48
Basic Crash Dump Analysis
Many times OCA doesn’t know the cause:
Basic crash dump analysis is easy and it might tell you the cause
Requires
Windbg and symbol configurationDump files are in either: \Windows\Memory.dmp: Vista+ and servers\Windows\Minidump: Windows 2000 Pro and Windows XPSlide49
The Case of the Spontaneous Reboots
Admin reported that server was sporadically rebootingOther admin saw ‘case of’ talk and looked in event log:Slide50
The Case of the Spontaneous Reboots: Solved
Crash dump showed that cpqteam.sys was likely responsible:
File properties showed it was HP
Proliant
network driver and old version:Went to HP’s site and got new version: problem solvedSlide51
Summary and More Information
A few basic tools and techniques can solve seemingly impossible problemsI learn by always trying to determine the root cause
Resources:
Webcasts of two previous “Case of the Unexplained “ talked
Sysinternals->Mark’s WebcastsSysinternals Video Library: in-depth dive on tools and troubleshootingMy blogWindows Internals: understand the way the OS worksIf you’ve solved one, send me a description, screenshots and log files!Slide52
Weekly, Monthly and Quarterly Rhythm of Topical Content
What is the Springboard Series?
To the IT pro, our goal is
Be the definitive resource for Desktop IT pros
Open, honest; show don’t tell
Information at right time, right level across Adoption Lifecycle
Inside of Microsoft we are
A turnkey IT pro engagement platform for depth and breadth
The program to mobilize MS marketing and field to
focus on desktop OS IT pros
Visit the Springboard Series on TechNet at www.microsoft.com/springboard
The Springboard Series IT pro experience offers dynamic content
and structured guidance across the adoption lifecycle
DEPLOY
PILOT
MANAGE
EXPLORE
DISCOVER
Is it worth the pain?
How does it change
my
work?
Is our
environment
ready?
Is the organization ready?
How do I maintain
and
optimize?
one-Windows
TechCenter
in 10 languages
Virtual
Roundtable Events
Springboard Technical Experts Panel Event Support
and Resources
Straight-talk Monthly Feature Articles and Overview Guides
TalkingAboutWindows
Video BlogsSlide53
Resources
Required Slide
www.microsoft.com/teched
Sessions On-Demand & Community
Microsoft Certification & Training Resources
Resources for IT Professionals
Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn
LearningSlide54
Complete an evaluation on
CommNet
and
enter to win!
Required SlideSlide55
Sign up for Tech·Ed 2011 and save $500
starting June 8 – June 31sthttp://
northamerica.msteched.com/registration
You can also register at the North America 2011 kiosk located at registrationJoin us in Atlanta next year Slide56
©
2010 Microsoft
Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.Slide57
Required Slide