Mark Russinovich Technical Fellow Windows Azure WCL301 Outline Introduction Sluggish Performance Error Messages Blue Screens Case of the Unexplained This is the 2012 version of the case of the unexplained talk series ID: 431772
Download Presentation The PPT/PDF document "The Case of the Unexplained…" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Case of the Unexplained…
Mark RussinovichTechnical FellowWindows Azure
WCL301Slide2
Outline
Introduction
Sluggish Performance
Error Messages
Blue ScreensSlide3
Case of the Unexplained…
This is the 2012 version of the “case of the unexplained” talk series
Previous versions covered different cases
Can view webcast on Sysinternals->Mark’s webcastsBased on real case studiesSome of these have been written up on my blogSlide4
Troubleshooting
Most applications do a poor job of reporting unexpected errors
Locked, missing or corrupt files
Missing or corrupt registry dataPermissions problemsErrors manifest in several different waysMisleading error messagesCrashes or hangsSlide5
Purpose of Talk
Show you how to solve these classes of problems by peering beneath the surface
Interpreting process, file and registry activity
Interpreting call stacksYou’ll learn tools and techniques to help you solve seemingly unsolvable problemsSlide6
Tools We’ll Use
Sysinternals: www.microsoft.com/technet/sysinternals
(
\\redmond\files\SYSINTERNALS\LBI\Latest) Process Explorer – process/thread viewerProcess Monitor – file/registry/process/thread tracingProcdump – process memory dumperAutoruns – displays all autostart locations
SigCheck – shows file version information PsExec
– execute processes remotely or in the system account
TcpView
– shows TCP/IP endpoints
Strings – dumps printable strings in any file
Zoomit
– presentation tool I’m using
Microsoft downloads:
Debugging Tools for Windows:
Windbg
application and kernel debugger:
www.microsoft.com/whdc/devtools/debugging
(
//dbg
) Slide7
The Sysinternals Administrator’s Reference
The official guide to the Sysinternals toolsCovers every tool, every feature, with tips
Written by
markruss and aaronmarAvailable in JuneFull chapters on the major tools:Process ExplorerProcess MonitorAutorunsOther chapters by tool groupSecurity, process, AD, desktop, …Slide8
Outline
Application Hangs
Sluggish Performance
Error Messages
Blue ScreensSlide9
Process Monitor
Process Monitor is a real-time file, registry, process and thread monitorWorks on Windows XP and higher, including 64-bit Windows
It replaces
Filemon and Regmon, but you can use Filemon and Regmon on older operating systemsEnhancements over Filemon/Regmon include:More advanced filteringOperation call stacksBoot-time logging
Data mining viewsProcess tree to see short-lived processesWhen in doubt, run Process Monitor!
It will often show you the cause for error messages
It many times tells you what is causing sluggish performanceSlide10
Process Monitor Enhancements: Bookmarks
Bookmarking enables you to save markers in the trace:
Use F6 to find the next one, Shift+F6 to search upSlide11
Process Monitor Enhancements:
Environment Variables and Current Directory
Process start event now captures new process environment variables and current directory:Slide12
The Case of the Slow IE Download Bar
User experienced 40 second delay for IE’s download bar to appear after clicking on a download link
Ran IE with no
addons: no change in behaviorCaptured a Process Monitor trace of hangSlide13
The Case of the Slow IE Download Bar
(Cont)
Used Count Occurrences dialog to look for errors
Saw BAD NETWORK PATH:Error were references to offline media center system:Slide14
The Case of the Slow IE Download Bar: Solved
Saw references to media center in download manager because of previous downloads:
Deleted references in download manager: problem solvedSlide15
The Case of the Hanging Paypal
Emails User started getting Outlook hangs of up to a minute when clicking on
Paypal
payment notification emailsCaptured a Process Monitor traceAdded Duration column:One event stood out with 3 second durationQuery of file share via IP addressSlide16
The Case of the Hanging Paypal
Emails (Cont)
Web search revealed IP address belongs to
Web statistics company OmnitureBut no IP address visible in email and image download disabled:Slide17
The Case of the Hanging Paypal
Emails: SolvedLooked at email source code and found domain name:
Outlook interprets reference as file server
Contacted Microsoft: not a security issueContacted Paypal: fixed email formatsIn the meantime, added hosts file entry: problem solvedSlide18
Outline
Application Hangs
Sluggish Performance
Error Messages
Blue ScreensSlide19
Process Explorer
Process Explorer is a Task Manager replacement
You can literally replace Task Manager with Options->Replace Task Manager
Hide-when-minimized to always have it handyHover the mouse to see a tooltip showing the process consuming the most CPUOpen System Information graph to see CPU usage historyGraphs are time stamped with hover showing biggest consumer at point in timeAlso includes other activity such as I/O, kernel memory limitsSlide20
Process Explorer v15: GPU Monitoring and Windows 8
Captures GPU utilization and memory usage
System-wide
Per-ProcessSlide21
Process Explorer v15.2
Process timelines
Autostart
locationsSlide22
The Case of the Runaway Website
For years, Jrun.exe process on web server would sporadically max out a core:
Administrator saw Case of the Unexplained and decided to investigateSlide23
Processes and Threads
A process represents an instance of a running programAddress space
Resources (e.g., open handles)
Security profile (token)A thread is an execution context within a processUnit of scheduling (threads run, processes don’t run)All threads in a process share the same per-process address spaceThe System process is the default home for kernel mode system threadsFunctions in OS and some drivers that need to run as real threadsE.g., need to run concurrently with other system activity, wait on timers, perform background “housekeeping” workOther host processes: svchost
, Iexplore, mmc
,
dllhostSlide24
Viewing Threads
Task Manager doesn’t show thread details within a processProcess Explorer does on “Threads” tab
Displays thread details such as ID, CPU usage, start time, state, priority
Start address is where the thread began running (not where it is now)Click Module to get details on module containing thread start addressSlide25
Thread Start Functions and Symbol Information
Process Explorer can map the addresses within a module to the names of functions
This can help identify which component within a process is responsible for CPU usage
Configure Process Explorer’s symbol engine:Download the latest Debugging Tools for Windows from Microsoft (free)Use dbghelp.dll from the Debugging ToolsPoint at the Microsoft public symbol server (or internal symbol server if you have access)Slide26
The Case of the Runaway Website (
Cont)
Thread start address didn’t reveal anything:Slide27
Viewing Call Stacks
Click Stack on the Threads tab to view a thread’s call stackNote that start address on Threads tab is different than first function shown in stack
This is because all threads created by Windows programs start in a library function in Kernel32.dll which calls the programmed start addressSlide28
The Case of the Runaway Website (
Cont)
Looked at stack and saw Cold Fusion DLL, Cfxneo.dll
No obvious reason for CPU usageWeb search didn’t turn up anythingSlide29
The Case of the Runaway Website (Cont
)Ran Process Monitor and saw lots of enumeration of a particular key:
Opened key in
Regedit: Regedit hungSlide30
The Case of the Runaway Website: Solved
Expansion finished after 10 minutes: tens of thousands of subkeys
Searched in ColdFusion documentation and found that key stores browser client state
Option to use Cookies insteadMade configuration change: Problem solvedSlide31
Outline
Application Hangs
Sluggish Performance
Error Messages
Blue ScreensSlide32
The Case of the Locked Folder
Company’s users complained of locked folders on their common network share:
Retrying would usually workSlide33
The Case of the Locked Folder: Solved
Admin used Process Explorer search and saw that thumbs.db
file was in use by Explorer
:Did research and learned that if thumbs.db is present, Explorer would open itNot clear why it was not closing it in a timely mannerFound group policy that disabled this behavior:Applied policy: problem solvedSlide34
The Case of the Missing .PPSX Details
Office 2010 User complained that Details tab of Explorer properties missing for .PPSX documents
Present for .PPS, though
.PPS
.PPSXSlide35
The Case of the Missing .PPSX Details (
Cont)
Captured Process Monitor trace opening Explorer properties of both file types
Compared side by side.PPSX trace had references to SystemFileAssociations\.ppsxCorresponding key in .PPS trace missingSlide36
The Case of the Missing .PPSX Details: Solved
Created key and imported settings for .PPS keyNow could see details, but not
edit them
Compared further and found reference to HKCR\.pps missingRepeated export/importProblem solved
No Edit
FixedSlide37
Outline
Application Hangs
Sluggish Performance
Error Messages
Blue ScreensSlide38
Blue Screen Crashes
Windows has various components that run in Kernel Mode, the highest privilege mode of the OS
OS components: Ntoskrnl.exe, Hal.dll
Drivers: Ntfs.sys, Tcpip.sys, device driversKernel-mode components are privileged extensions to the OS have to adhere to various rulesNot accessing invalid memoryAccessing memory at the right “Interrupt Request Level”Not causing resource deadlocksWhen a kernel-mode component performs an illegal operation, Windows crashes (blue screens)Crashing helps preserve the integrity of user dataA resource deadlock can hang the systemSlide39
Online Crash Analysis
When you reboot after a crash, Windows offers to upload it to Microsoft Online Crash Analysis (OCA)
Automated server generates a thumbprint of the crash and uses it as a key in a database
If the database has an entry, the user is told the cause and directed at a fixSlide40
Basic Crash Dump Analysis
Many times OCA doesn’t know the cause:
Basic crash dump analysis is easy and it might tell you the cause
Requires Windbg and symbol configurationDump files are in either:\Windows\Memory.dmp: Vista+ and servers\Windows\Minidump: Windows 2000 Pro, Windows XP, Vista+ Slide41
The Case of the Crashing Hyper-V Systems
Hyper-v cluster started having random crashes
Saw Case of the Unexplained so opened
minidump Executed !analyze –v:Slide42
The Case of the Crashing Hyper-V
Systems: SolvedSearched for stop code:
Found KB article with fix that matched symptoms: problem solvedSlide43
The Case of the GFI Backup Crash
Admin updated GFI backup from 2009 to 2011 versionReproducible system crash (BSOD) when backing up to a network location after ~1 minute
Analyzed crash dump with Windbg:Slide44
The Case of the GFI Backup Crash (
Cont)
C
hecked online information on ncfsd.sysFirst hits with Google link to bogus websites, trying to convince you to run their code in order to “fix“ your “infection“:Slide45
The Case of the GFI Backup Crash (Cont
)Second hit reported crash could be result of low disk space:
Freed up 20 Gb of disk space
Time to crash took longer (~3 minutes)Slide46
The Case of the GFI Backup Crash: Solved
Another hit pointed at Novell Client driver as the problem:
Disabled Novell service
Problem persistedUnistalled Novell client because it was not really needed: problem solvedSlide47
Summary and More Information
A few basic tools and techniques can solve seemingly impossible problemsI learn by always trying to determine the root cause
Resources:
Sysinternals Administrator’s ReferenceWebcasts of two previous “Case of the Unexplained “ talkedSysinternals->Mark’s WebcastsMy blogWindows Internals: understand the way the OS worksIf you’ve solved one, send me a description, screenshots and log files!Slide48
Windows 8 Bluescreens