1. 概述

你是否遇到过程序无缘无故的挂起或者运行很慢,而找出问题的原因确不是一件简单的事。Thread Dump 可以提供当前正在运行的 Java 进程的状态快照。而生成的数据包含多个长文件,我们很难在一大堆不相关的数据中挖掘出有价值的信息。

本教程,我们将学习如何过滤这些数据以更高效地诊断性能问题,以及学习如何检测性能瓶颈或者一些简单的bug。

2. JVM 中的线程

The JVM uses threads to execute every internal and external operation. As we know, the garbage collection process has its own thread, but also the tasks inside a Java application create their own.

During its lifetime, the thread goes through a variety of states. Each thread has an execution stack tracking the current operation. Besides, the JVM also stores all the previous methods successfully called. Therefore, it is possible to analyze the complete stack to study what happened with the application when things go wrong.

To showcase the topic of this tutorial, we'll use as an example a simple Sender-Receiver application (NetworkDriver). The Java program sends and receives data packets so we'll be able to analyze what is happening behind the scenes.

2.1. 捕捉 Java Thread Dump

程序运行后,有几种方法可以生成 thread dump 文件以诊断 Java 程序。本教程我们将使用 JDK7+ 自带的2个工具。首先我们执行 jps 命令找到目标Java程序的pid:

$ jps 
80661 NetworkDriver
33751 Launcher
80665 Jps
80664 Launcher
57113 Application

得到pid后(本例的pid是80661,即NetworkDriver这个进程)。然后执行 jstack 命令捕捉 thread dump。最后将结果写到text文件中:

$ jstack -l 80661 > sender-receiver-thread-dump.txt

2.2. dump 数据结构

让我们看一下生成的thread dump。 第一行显示时间戳,第二行显示有关JVM的信息:

2021-01-04 12:59:29
Full thread dump OpenJDK 64-Bit Server VM (15.0.1+9-18 mixed mode, sharing):

下一部分显示了Safe Memory Reclamation (SMR)和非JVM内部线程信息:

Threads class SMR info:
_java_thread_list=0x00007fd7a7a12cd0, length=13, elements={
0x00007fd7aa808200, 0x00007fd7a7012c00, 0x00007fd7aa809800, 0x00007fd7a6009200,
0x00007fd7ac008200, 0x00007fd7a6830c00, 0x00007fd7ab00a400, 0x00007fd7aa847800,
0x00007fd7a6896200, 0x00007fd7a60c6800, 0x00007fd7a8858c00, 0x00007fd7ad054c00,
0x00007fd7a7018800
}

然后 dump 的是线程列表。每个线程中包含下面的信息:

  • 线程名:  设置一个有意义的线程名字很有用
  • 优先级 (prior): 该线程的优先级
  • Java ID (tid): 线程ID,由JVM分配的唯一ID
  • Native ID (nid): 操作系统分配的唯一ID,用于提取与CPU或内存处理的相关信息
  • 状态:  该线程状态
  • 堆栈信息:  最重要的数据,帮助我们知道当前该线程正在执行什么

我们可以从上往下滑,查看快照时刻每个线程正在做什么。下面我们仅关注我们感兴趣的那部分:等待消费消息

    "Monitor Ctrl-Break" #12 daemon prio=5 os_prio=31 cpu=17.42ms elapsed=11.42s tid=0x00007fd7a6896200 nid=0x6603 runnable  [0x000070000dcc5000]
       java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.SocketDispatcher.read0([email protected]/Native Method)
        at sun.nio.ch.SocketDispatcher.read([email protected]/SocketDispatcher.java:47)
        at sun.nio.ch.NioSocketImpl.tryRead([email protected]/NioSocketImpl.java:261)
        at sun.nio.ch.NioSocketImpl.implRead([email protected]/NioSocketImpl.java:312)
        at sun.nio.ch.NioSocketImpl.read([email protected]/NioSocketImpl.java:350)
        at sun.nio.ch.NioSocketImpl$1.read([email protected]/NioSocketImpl.java:803)
        at java.net.Socket$SocketInputStream.read([email protected]/Socket.java:981)
        at sun.nio.cs.StreamDecoder.readBytes([email protected]/StreamDecoder.java:297)
        at sun.nio.cs.StreamDecoder.implRead([email protected]/StreamDecoder.java:339)
        at sun.nio.cs.StreamDecoder.read([email protected]/StreamDecoder.java:188)
        - locked <0x000000070fc949b0> (a java.io.InputStreamReader)
        at java.io.InputStreamReader.read([email protected]/InputStreamReader.java:181)
        at java.io.BufferedReader.fill([email protected]/BufferedReader.java:161)
        at java.io.BufferedReader.readLine([email protected]/BufferedReader.java:326)
        - locked <0x000000070fc949b0> (a java.io.InputStreamReader)
        at java.io.BufferedReader.readLine([email protected]/BufferedReader.java:392)
        at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:61)
    
       Locked ownable synchronizers:
        - <0x000000070fc8a668> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

乍一看,我们的main堆栈正在执行java.io.BufferedReader.readLine方法,这符合我们的预期。如果我们继续往下看,我们会看到应用程序背后执行的所有JVM方法。因此,我们能够通过查看源代码或其他内部JVM处理来确定问题的根源。

在dump文件最后,我们注意到还有一些其他线程在执行后台操作,例如垃圾回收(GC)或对象终止:

    "VM Thread" os_prio=31 cpu=1.85ms elapsed=11.50s tid=0x00007fd7a7a0c170 nid=0x3603 runnable  
    "GC Thread#0" os_prio=31 cpu=0.21ms elapsed=11.51s tid=0x00007fd7a5d12990 nid=0x4d03 runnable  
    "G1 Main Marker" os_prio=31 cpu=0.06ms elapsed=11.51s tid=0x00007fd7a7a04a90 nid=0x3103 runnable  
    "G1 Conc#0" os_prio=31 cpu=0.05ms elapsed=11.51s tid=0x00007fd7a5c10040 nid=0x3303 runnable  
    "G1 Refine#0" os_prio=31 cpu=0.06ms elapsed=11.50s tid=0x00007fd7a5c2d080 nid=0x3403 runnable  
    "G1 Young RemSet Sampling" os_prio=31 cpu=1.23ms elapsed=11.50s tid=0x00007fd7a9804220 nid=0x4603 runnable  
    "VM Periodic Task Thread" os_prio=31 cpu=5.82ms elapsed=11.42s tid=0x00007fd7a5c35fd0 nid=0x9903 waiting on condition

最后一行,dump 显示 Java 本地接口(JNI)引用。当程序出现内存泄漏的问题时,我们应该特别注意这一点,因为它们不会被垃圾回收器自动回收。

JNI global refs: 15, weak refs: 0

线程dump在结构上非常相似,重点在于我们如何过滤掉不重要的数据。特别是当堆栈跟踪产生大量的日志时,如何从中保留和分组重要的信息。接下来我们就看看如何实现!

3. Thread Dump 分析建议

In order to understand what is happening with our application, we'll need to efficiently analyze the generated snapshot. We'll have a lot of information with precise data of all the threads at the time of the dump. However, we'll need to curate the log files, making some filtering and grouping to extract useful hints from the stack trace. Once we prepared the dump we'll be able to analyze the problem using different tools. Let's see how to decipher the content of a sample dump.

3.1. 同步问题

One interesting tip to filter out the stack trace is the state of the thread. We'll mainly focus on RUNNABLE or BLOCKED threads and eventually TIMED_WAITING ones. Those states will point us in the direction of a conflict between two or more threads:

  • In a deadlock situation in which several threads running hold a synchronized block on a shared object
  • In thread contention, when a thread is blocked waiting for others to finish. For example, the dump generated in the previous section

3.2. 执行问题

As a rule of thumb, for abnormally high CPU usage we only need to look at RUNNABLE threads. We'll use thread dumps together with other commands to acquire extra information. One of these commands is top -H -p PID, which displays what threads are consuming the OS resources within that particular process. We also need to look at the internal JVM threads such as GC just in case. On the other hand, when the processing performance is abnormally low, we'll look at BLOCKED threads.

In those cases, a single dump will most surely not be enough to understand what is happening. We'll need a number of dumps at close intervals in order to compare the stacks of the same threads at different times. On the one hand, one snapshot is not always enough to find out the root of the problem. On the other hand, we need to avoid noise between snapshots (too much information).

To understand the threads' evolution over time, a recommended best practice is to take at least 3 dumps, one at every 10 seconds. Another useful tip is to split the dumps into small chunks to avoid crashes loading the files.

3.3. 建议

In order to efficiently decipher the root of the problem, we'll need to organize the huge amount of information in the stack trace. Therefore, we'll take into consideration the following recommendations:

  • In execution issues, capture several snapshots with an interval of 10 seconds will help to focus on the actual problems. It is also recommended to split the files if needed to avoid loading crashes
  • Use naming when creating new threads to better identify your source code
  • Depending on the issue, ignore internal JVM processing (for instance GC)
  • Focus on long-running or blocked threads when issuing abnormal CPU or memory usage
  • Correlate the thread's stack with CPU processing by using top -H -p PID
  • And most importantly, use Analyzer tools

Analyzing the Java thread dumps manually could be a tedious activity. For simple applications, it is possible to identify the threads generating the problem. On the other hand, for complex situations, we'll need tools to ease this task. We'll showcase how to use the tools in the next sections, using the dump generated for the sample thread contention.

4. 在线工具

There are several online tools available. When using this kind of software we need to take into account security issues. Remember that we could be sharing the logs with third-party entities.

4.1. FastThread

FastThread is probably the best online tool to analyze thread dumps for production environments. It provides a very nice graphical user interface. It also includes multiple functionalities such as CPU usage by threads, stack length, and most used and complex methods:

FastThread incorporates a REST API feature to automate the analysis of the thread dumps. With a simple cURL command, it's possible to instantly send the results. The main drawback is security because it stores the stack trace in the cloud.

4.2. JStack Review

JStack Review is an online tool that analyzes the dumps within the browser. It is client-side only, thus no data is stored outside your computer. From the security perspective, this is a major advantage to use it. It provides a graphical overview of all the threads, displaying the running methods but also grouping them by status. JStack Review separates threads producing stack from the rest which is very important to ignore, for instance, internal processes. Finally, it also includes the synchronizers and the ignored lines:

4.3. Spotify Online Java Thread Dump Analyzer

Spotify Online Java Thread Dump Analyser is an online open-source tool written in JavaScript. It shows the results in plain text separating the threads with and without the stack. It also displays the top methods from the running threads:

5. 离线工具

There are also several standalone applications that we can use locally.

5.1. JProfiler

JProfiler is the most powerful tool in the market, and well-known among the Java developer community. It is possible to test the functionality with a 10-day trial license. JProfiler allows the creation of profiles and attaches running applications to them. It includes multiple functionalities to identify problems on the spot, such as CPU and memory usage and database analysis. It supports also integration with IDEs:

5.2. IBM Thread Monitor and Dump Analyzer for Java (TMDA)

IBM TMDA can be used to identify thread contention, deadlocks, and bottlenecks. It is freely distributed and maintained but it does not offer any guarantee or support from IBM:

5.3. Irockel Thread Dump Analyser (TDA)

Irockel TDA is a standalone open-source tool licensed with LGPL v2.1. The last version (v2.4) was released in August 2020 so it is well maintained. It displays the thread dump as a tree providing also some statistics to ease the navigation:

Finally, IDEs support basic analysis of thread dumps so it is possible to debug the application during development time.

5. 总结

In this article, we demonstrated how Java thread dump analysis can help us pinpoint synchronization or execution issues.

Most importantly, we reviewed how to analyze them properly including recommendations to organize the enormous amount of information embedded in the snapshot.