1. Introduction
In my opinion, one of the greatest things about Java is the ability to get meaningful thread dumps on a running production environment without having to enable DEBUG mode. The thread dump is a snapshot of exactly what's executing at a moment in time. While the thread dump format and content may vary between the different Java vendors, at the bare minimum it provides you a list of the stack traces for all Java threads in the Java Virtual Machine. Using this information, you can either analyze the problem yourself, or work with those who wrote the running code to analyze the problem.
2. What is a stack trace?
I mentioned earlier that the thread dump is just a list of all threads and the full stack trace of code running on each thread. If you are a J2EE Application Server administrator and you've never done development before, the concept of a stack trace may be foreign to you. A stack trace is a dump of the current execution stack that shows the method calls running on that thread from the bottom up. If you're unfamiliar with what a method is, please see: http://en.wikipedia.org/wiki/Subroutine.
Here is an example stack trace for a thread running in WebLogic:
"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=1 tid=0x0938ac90 nid=0x2f53 waiting for monitor entry [0x80c77000..0x80c78040]
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:95)
- waiting to lock <0x8d3f6df0> (a weblogic.socket.PosixSocketMuxer$1)
at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)
You don't need to understand what it's doing at the moment, but the key point to remember is that it is bottom up. This means that it started with weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117), that method called the "execute" method above it which called the one above it and so on.
The key here is knowing that what is currently running (or waiting) is always the top method. This will give you insight as to what the threads are stuck on. I usually glance through the whole stack trace because developers usually provide clues in method names as to what is actually going on. In the above thread, you'll notice that the method currently running (or in this case, waiting on a java lock) is weblogic.socket.PosixSocketMuxer.processSockets(). Socket Muxer is a term used by BEA and others to refer to managing data on network sockets (data sent across the network interface)
3. Thread pools
Most application servers use thread pools to manage execution tasks of a certain type. A thread pool is merely a collection of threads set aside for a specific task. In the example thread above, I've shown you a thread from the WebLogic thread pool (or queue) named "weblogic.socket.Muxer". A pool of these "Muxer threads" is set aside by WebLogic to manage reading and writing data for network connections coming into WebLogic.
In most cases, you won't be looking at the Muxer threads. When someone reports that the application server isn't responding, it usually means that the application code deployed to the application server isn't working right - and you'll need to figure out why. So you'll need to identify the thread pool that your application code runs on and find those threads in the Java thread dump to see what's going on.
Unless you've customized the execute queue/thread pool that your application gets deployed too, you'll want to look for the given application servers "Default" execute queue. In WebLogic, you'll look for the threads marked as 'weblogic.kernel.Default' to see whats running.
For WebLogic 10, here is an example of just such a thread:
"[ACTIVE] ExecuteThread: '12' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=1 tid=0x091962f8 nid=0x2f95 in Object.wait() [0x7cd75000..0x7cd75ec0]
at java.lang.Object.wait(Native Method)
- waiting on <0x8ed19d28> (a weblogic.work.ExecuteThread)
at java.lang.Object.wait(Object.java:474)
at weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:156)
- locked <0x8ed19d28> (a weblogic.work.ExecuteThread)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:177)
In WebLogic, when you see the keyword "waitForRequest" in a particular thread's stack trace, it means that the thread is idle and could potentially do work if a new request came in. Having idle threads is usually a good thing, but it doesn't always guarantee that your application is healthy or responsive. It just means that you have the potential for more throughput.
Once you know which execute queue is used for the unresponsive application code, you know exactly where to look to figure out what's going on.
4. Getting a thread dump
There are quite a few ways to obtain a thread dump from a running virtual machine. I've found that the easiest way to get a thread dump is to send the HUP or BREAK signal to the java process. In Unix, this is done by figuring out the PID (process ID) of the Java virtual machine and doing a "kill -3 ".
In Windows, you can use the SendSignal.exe program to send a "break" signal to the process and obtain a thread dump.
Where you go to find the thread dump usually depends on Java implementation. For most vendors, you'll go to the "standard out" log file. In WebLogic, it's often referred to as "weblogic.out", "nohup.out" or something you've created yourself by redirecting standard output to a file. This is why it's critical to redirect standard output & standard error to a log file so it is captured when you need a thread dump. See http://www.javasanity.org/basicsetup for information on how to do this.
For the IBM Java, the thread dump gets printed to a file with a prefix of "javacore" in it's name. The file gets written to the Java process "working directory". You may have to do some digging to find it.
5. Full Thread dump
Take a look at this example thread dump taken from WebLogic 10 while running some sample requests. You can look at this to get familiar with the format of a WebLogic thread dump running in the Sun JVM. Take a moment to search for the default execute queue and figure out what the code was doing when this thread dump was taken Don't read the next section until you're sure you want the answer.
6. Summary
Being able to obtain and analyze Java thread dumps is critical for understanding what's really going in inside your J2EE application server. Without thread dumps, you're blind when it comes to trying to get to root cause for an application server "hang" condition. I'd suggest always getting a thread dump before restarting a hung or misbehaving application server. It might not always be useful, but it doesn't hurt to get one in case it's needed later.
If you took the time to analyze the full thread dump in section 5, I hope you took the time to find the threads marked "weblogic.kernel.Default" to see what's going on. In this thread dump, there are only two types of requests running on the Default queue. There are idle threads (which are not the issue) and there are threads running my "insert_test.jsp" java server page.
If you found that they all were doing something within "Oracle" routines, kudos to you. If you understand network programming, you might even have realized that when the top function is doing a "socketRead", it's waiting on data to come across the network. Seeing most of your executing threads waiting on data from the network is usually a pointer to the next place you need to look to identify the cause of your problem.
What I provided here was a simulation of database issue. In this case, all running "Default" threads were waiting on a response from the database. This response was never going to come, because I locked the table in exclusive mode and the jsp was trying to do an insert. I solved this by killing the lock - this allowed the threads to complete. A poorly tuned database (missing or corrupt indexes, SGA, disk I/O, high CPU on the DB, etc) can cause a similar "socketRead" bottleneck within the jdbc drivers for your application. I've also found that poorly designed SQL code or tables that were too large and never purged can cause this as well.
Creating Thread Dumps
To create a thread dump from a process, do either of the following:
■Press Ctrl-Break while the process is running (or by sending SIGQUIT to the process on Linux).
■Enter the following at the command line at startup:
bin\jrcmd.exe
print_threads
The thread dump appears at the command line.
Note: For more information about jrcmd and Ctrl-Break handlers, see Running Diagnostic Commands.
--------------------------------------------------------------------------------
Reading Thread Dumps
This section describes the typical contents of a thread dump by going through an example thread dump from the beginning to end. First, an example thread dump, broken up into its components is presented (see Listing 20-1, Listing 20-2, Listing 20-3, Listing 20-4 and Listing 20-5). First, information about the main thread is printed, then all the JVM internal threads, followed by all other Java application threads (if there are any). Finally, information about lock chains are printed.
The example thread dump is taken from a program that creates three threads that are quickly forced into a deadlock. The application threads Thread-0, Thread-1, and Thread-2 correspond to three different classes in the Java code.
The Beginning of The Thread Dump
The thread dump starts with the date and time of the dump, and the version number of the JRockit JVM used (see Listing 20-1).
Listing 20-1 The initial information of a thread dump
===== FULL THREAD DUMP ===============Wed Feb 21 13:46:45 2007BEA JRockit(R) R27.1.0-109-73164-1.5.0_08-20061129-1428-windows-ia32Stack Trace for Main Application Thread
Listing 20-2 shows the stack trace of the main application thread. There is a thread information line, followed by information about locks and a trace of the thread’s stack at the moment of the thread dump.
Listing 20-2 The main thread in the thread dump
"Main Thread" id=1 idx=0x2 tid=48652 prio=5 alive, in native, waiting-- Waiting for notification on: util/repro/Thread1@0x01226528[fat lock]at jrockit/vm/Threads.waitForSignal(J)Z(Native Method)at java/lang/Object.wait(J)V(Native Method)at java/lang/Thread.join(Thread.java:1095)^-- Lock released while waiting: util/repro/Thread1@0x01226528[fat lock]at java/lang/Thread.join(Thread.java:1148)at util/repro/DeadLockExample.main(DeadLockExample.java:23)at jrockit/vm/RNI.c2java(IIII)V(Native Method)-- end of traceAfter the name and other identification information, the different status messages of the main thread are printed. The main thread in Listing 20-2 is a running thread (alive), it is either executing JVM internal code or user-defined JNI code (in native), and it is currently waiting for an object to be released (waiting). If a thread is waiting on a notification on a lock (by calling Object.wait()), this is indicated at the top of the stack trace as Waiting for notification on.
Locks and Lock Chains
For each thread, the JRockit JVM prints the following information:
■If the thread is trying to take a lock (to enter a synchronized block), but the lock is already held by another thread, this is indicated at the top of the stack trace, as “Blocked trying to get lock”.
■If the thread is waiting on a notification on a lock (by calling Object.wait()), this is indicated at the top of the stack trace as “Waiting for notification”.
■If the thread has taken any locks, this is shown in the stack trace. After a line in the stack trace describing a function call is a list of the locks taken by the thread in that function. This is described as ^-- Holding lock (where the ^-- serves as a reminder that the lock is taken in the function written above the line with the lock).
The semantics for waiting (for notification) on an object in Java is somewhat complex. First, to enter a synchronized block, you must take the lock for the object, and then you call wait() on that object. In the wait method, the lock is released before the thread actually goes to sleep waiting for a notification. When it receives a notification, wait re-takes the lock before returning. So, if a thread has taken a lock, and is waiting (for notification) on that lock, the line in the stack trace that describes when the lock was taken is not shown as “Holding lock,” but as “Lock released while waiting.”
All locks are described as Classname@0xLockID[LockType]; for example:
java/lang/Object@0x105BDCC0[thin lock]Classname@0xLockID describe the object to which the lock belongs. The classname is an exact description, the fully qualified classname of the object. LockID, on the other hand, is a temporary ID which is only valid for a single thread dump. That is, you can trust that if a thread A holds a lock java/lang/Object@0x105BDCC0, and a thread B is waiting for a lock java/lang/Object@0x105BDCC0, in a single thread dump, then it is the same lock. If you do any subsequent thread dumps however, LockID is not comparable and, even if a thread holds the same lock, it might have a different LockID and, conversely, the same LockID does not guarantee that it holds the same lock. LockType describes the JVM internal type of the lock (fat, thin, recursive, or lazy). The status of active locks (monitors) is also shown in stack traces.
Presentation of Locks Out of Order
The lines with the lock information might not always be correct, due to compiler optimizations. This means two things:
■If a thread, in the same function, takes lock A first and then lock B, the order in which they are printed is unspecified.
■If a thread, in method foo() calls method bar(), and takes a lock A in bar(), the lock might be printed as being taken in foo().
Normally, this should not be a problem. The order of the lock lines should never move much from their correct position. Also, lock lines will never be missing—you can be assured that all locks taken by a thread are shown in the thread dump.
JVM Internal Threads
Listing 20-3 shows the traces of JVM internal threads. The threads have been marked as daemon threads, as can be seen by their daemon state indicators. Daemon threads are either JVM internal threads (as in this case) or threads marked as daemon threads by java.lang.Thread.setDaemon().
Listing 20-3 The first and last thread in a list of JVM internal Threads
"(Signal Handler)" id=2 idx=0x4 tid=48668 prio=5 alive, in native, daemon[...]"(Sensor Event Thread)" id=10 idx=0x1c tid=48404 prio=5 alive, in native, daemonAs you can see, lock information and stack traces are not printed for the JVM internal threads in Listing 20-3. This is the default setting.
If you want to see stack traces for the JVM internal threads, then use the parameter nativestack=true when you send the print_threads handler. At the command line, write the following:
bin\jrcmd.exe print_threads nativestack=true
Other Java Application Threads
Normally, you will primarily be interested in the threads of the Java application you are running (including the main thread). All Java application threads except the main thread are presented near the end of the thread dump. Listing 20-4 shows the stack traces of three different application threads.
Listing 20-4 Additional application threads
"Thread-0" id=11 idx=0x1e tid=48408 prio=5 alive, in native, blocked-- Blocked trying to get lock: java/lang/Object@0x01226300[fat lock]at jrockit/vm/Threads.waitForSignal(J)Z(Native Method)at jrockit/vm/Locks.fatLockBlockOrSpin(ILjrockit/vm/ObjectMonitor;II)V(Unknown Source)at jrockit/vm/Locks.lockFat(Ljava/lang/Object;ILjrockit/vm/ObjectMonitor;Z)Ljava/lang/Object;(Unknown Source)atjrockit/vm/Locks.monitorEnterSecondStage(Ljava/lang/Object;I)Ljava/lang/Object;(Unknown Source)at jrockit/vm/Locks.monitorEnter(Ljava/lang/Object;)Ljava/lang/Object;(Unknown Source)at util/repro/Thread1.run(DeadLockExample.java:34)^-- Holding lock: java/lang/Object@0x012262F0[thin lock]^-- Holding lock: java/lang/Object@0x012262F8[thin lock]at jrockit/vm/RNI.c2java(IIII)V(Native Method)-- end of trace"Thread-1" id=12 idx=0x20 tid=48412 prio=5 alive, in native, blocked-- Blocked trying to get lock: java/lang/Object@0x012262F8[thin lock]at jrockit/vm/Threads.sleep(I)V(Native Method)at jrockit/vm/Locks.waitForThinRelease(Ljava/lang/Object;I)I(Unknown Source)at jrockit/vm/Locks.monitorEnterSecondStage(Ljava/lang/Object;I)Ljava/lang/Object;(Unknown Source)at jrockit/vm/Locks.monitorEnter(Ljava/lang/Object;)Ljava/lang/Object;(Unknown Source)at util/repro/Thread2.run(DeadLockExample.java:48)at jrockit/vm/RNI.c2java(IIII)V(Native Method)-- end of trace"Thread-2" id=13 idx=0x22 tid=48416 prio=5 alive, in native, blocked
-- Blocked trying to get lock: java/lang/Object@0x012262F8[thin lock]
at jrockit/vm/Threads.sleep(I)V(Native Method)
at jrockit/vm/Locks.waitForThinRelease(Ljava/lang/Object;I)I(Unknown Source)
at jrockit/vm/Locks.monitorEnterSecondStage(Ljava/lang/Object;I)Ljava/lang/Object;(Unknown Source)
at jrockit/vm/Locks.monitorEnter(Ljava/lang/Object;)Ljava/lang/Object;(Unknown Source)
at util/repro/Thread3.run(DeadLockExample.java:65)
^-- Holding lock: java/lang/Object@0x01226300[fat lock]
at jrockit/vm/RNI.c2java(IIII)V(Native Method)
-- end of trace
All three threads are in a blocked state (indicated by blocked), which means that they are all trying to enter synchronized blocks. Thread-0 is trying to take Object@0x01226300[fat lock] but this is held by Thread-2. Both Thread-2 and Thread-1 are trying to take Object@0x012262F8[thin lock] but this lock is held by Thread-0. This means that Thread-0 and Thread-2 form a deadlock, while Thread-1 is blocked.
Lock Chains
One prominent feature of the JRockit JVM is that it automatically detects deadlocked, blocked and open lock chains among the running threads. The analysis in Listing 20-5 presents the all the lock chains created by the threads T1, T2, T3, T4 and T5. This information can be used to tune and troubleshoot your Java code.
Listing 20-5 Deadlocked and blocked lock chains
Circular (deadlocked) lock chains=================================Chain 6:"Dead T1" id=16 idx=0x48 tid=3648 waiting for java/lang/Object@0x01225018 held by:"Dead T3" id=18 idx=0x50 tid=900 waiting for java/lang/Object@0x01225010 held by:"Dead T2" id=17 idx=0x4c tid=3272 waiting for java/lang/Object@0x01225008 held by:"Dead T1" id=16 idx=0x48 tid=3648Blocked lock chains===================Chain 7:"Blocked T2" id=20 idx=0x58 tid=3612 waiting for java/lang/Object@0x01225310 held by:"Blocked T1" id=19 idx=0x54 tid=2500 waiting for java/lang/Object@0x01224B60 held by: "Open T3" id=13 idx=0x3c tid=1124 in chain 1Open lock chains================Chain 1:"Open T5" id=15 idx=0x44 tid=4048 waiting for java/lang/Object@0x01224B68 held by:"Open T4" id=14 idx=0x40 tid=3380 waiting for java/lang/Object@0x01224B60 held by:"Open T3" id=13 idx=0x3c tid=1124 waiting for java/lang/Object@0x01224B58 held by:"Open T2" id=12 idx=0x38 tid=3564 waiting for java/lang/Object@0x01224B50 held by:"Open T1" id=11 idx=0x34 tid=2876 (active)
--------------------------------------------------------------------------------
Thread Status in Thread Dumps
This section describes the different statuses or states a thread can show in a thread dump. There are three types of states:
■Life States
■Run States
■Special States
Life States
Table 20-1 describes the life states a thread can show in a thread dump.
Table 20-1 Thread Life States State Description
alive This is a normal, running thread. Virtually all threads in the thread dump will be alive.
not started The thread has been requested to start running by java.lang.Thread.start(), but the actual OS process has not yet started, or executed far enough to pass control to the JRockit JVM. It is extremely unlikely to see this value. A java.lang.Thread object that is created, but not has had start() executed, will not show up in the thread dump.
terminated This thread has finished its run() method and has also notified any threads joining on it, but it is still kept in the JVM internal thread structure for running threads. It is extremely unlikely to see this value. A thread that has been terminated for a time longer than a few milliseconds will not show up in the thread dump.
Run States
Table 20-2 describes the run states a thread can show in a thread dump.
Table 20-2 Thread Run States State Description
blocked This thread has tried to enter a synchronized block, but the lock was taken by another thread. This thread is blocked until the lock gets released.
blocked (on thin lock) This is the same state as blocked, but with the additional information that the lock in question is a thin lock.
waiting This thread has called Object.wait() on an object. The thread will remain there until some other thread sends a notification on that object.
sleeping This thread has called java.lang.Thread.sleep().
parked This thread has called java.util.concurrent.locks.LockSupport.park().
suspended The thread’s execution has been suspended by java.lang.Thread.suspend() or a JVMTI/JVMPI agent call
Special States
Table 20-3 describes the special states a thread can show in a thread dump. Note that all these states are not mutually exclusive.
--------------------------------------------------------------------------------
Troubleshooting with Thread Dumps
This section contains information on about how to use thread dumps for troubleshooting and diagnostics.
To use thread dumps for troubleshooting, beyond detecting deadlocks, you need to take several thread dumps from the same process. However, if you want to do long time analysis of behavior you will likely be more helped by combining occasional thread dumps with other diagnostics tools, such as the JRockit Runtime Analyzer, which is part of Oracle JRockit Mission Control (see Using Oracle JRockit Mission Control Tools for more information).
Detecting Deadlocks
The Oracle JRockit JVM automatically analyzes the thread dump information and detects whether there exists any circular (deadlocked) or blocked lock chains in it.
Detecting Processing Bottlenecks
For detecting more than deadlocks in your threads, you have to make several consecutive thread dumps. This lets you detect the occurrence of contention, where multiple threads are trying to get the same lock. Contention might create long open lock chains that, while not deadlocked, will degrade performance.
If you discover (in a set of consecutive thread dumps) that one or more threads in your application is temporarily stuck waiting for a lock to be released, then you might have reason to look over the code of your Java application to see if the synchronization (serialization) is necessary or if the threads can be organized differently.
Viewing The Runtime Profile of an Application
By making several consecutive thread dumps, you might quickly get an overview of which parts of your Java application that are most heavily used. However, you should consult the Threads tab in JRockit Management Console for more detailed information about the workload on the different parts of your application.