1. Overview

Java Sampling Profilers are usually designed using the JVM Tool Interface (JVMTI) and collect stack traces at a safepoint. Therefore, these sampling profilers can suffer from the safepoint bias problem.

For a holistic view of the application, we need a sampling profiler that doesn’t require threads to be at safepoints and can collect the stack traces at any time to avoid the safepoint bias problem.

In this tutorial, we’ll explore async-profiler along with various profiling techniques it offers.

2. async-profiler

async-profiler is a sampling profiler for any JDK based on the HotSpot JVM. It has low overhead and doesn’t rely on JVMTI.

It avoids the safepoint bias problem by using the AsyncGetCallTrace API provided by HotSpot JVM to profile the Java code paths, and Linux’s perf_events to profile the native code paths.

In other words, the profiler matches call stacks of both Java code and native code paths to produce accurate results.

3. Setup

3.1. Installation

First, we’ll download the latest release of async-profiler based on our platform. Currently, it supports Linux and macOS platforms only.

Once downloaded, we can check if it’s working on our platform:

$ ./profiler.sh --version
Async-profiler 1.7.1 built on May 14 2020
Copyright 2016-2020 Andrei Pangin

It’s always a good idea to check all the options available with async-profiler beforehand:

$ ./profiler.sh
Usage: ./profiler.sh [action] [options] 
Actions:
  start             start profiling and return immediately
  resume            resume profiling without resetting collected data
  stop              stop profiling
  check             check if the specified profiling event is available
  status            print profiling status
  list              list profiling events supported by the target JVM
  collect           collect profile for the specified period of time
                    and then stop (default action)
Options:
  -e event          profiling event: cpu|alloc|lock|cache-misses etc.
  -d duration       run profiling for  seconds
  -f filename       dump output to 
  -i interval       sampling interval in nanoseconds
  -j jstackdepth    maximum Java stack depth
  -b bufsize        frame buffer size
  -t                profile different threads separately
  -s                simple class names instead of FQN
  -g                print method signatures
  -a                annotate Java method names
  -o fmt            output format: summary|traces|flat|collapsed|svg|tree|jfr
  -I include        output only stack traces containing the specified pattern
  -X exclude        exclude stack traces with the specified pattern
  -v, --version     display version string

  --title string    SVG title
  --width px        SVG width
  --height px       SVG frame height
  --minwidth px     skip frames smaller than px
  --reverse         generate stack-reversed FlameGraph / Call tree

  --all-kernel      only include kernel-mode events
  --all-user        only include user-mode events
  --cstack mode     how to traverse C stack: fp|lbr|no

 is a numeric process ID of the target JVM
      or 'jps' keyword to find running JVM automatically

Many of the shown options will come handy in the later sections.

3.2. Kernel Configuration

When using async-profiler on the Linux platform, we should make sure to configure our kernel to capture call stacks using the perf_events by all users:

First, we’ll set the perf_event_paranoid to 1, which will allow the profiler to collect performance information:

$ sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

Then, we’ll set the kptr_restrict to 0 to remove the restrictions on exposing kernel addresses:

$ sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'

However, the async-profiler will work by itself on the macOS platform.

Now that our platform is ready, we can build our profiling application and run it using the Java command:

$ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -jar path-to-jar-file

Here, we’ve started our profiling app using the -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints JVM flags that are highly recommended for accurate results.

Now that we’re ready to profile our application, let’s explore various types of profiling supported by the async-profiler.

4. CPU Profiling

Async-profiler collects sample stack traces of Java methods, including JVM code, native class, and kernel functions, when profiling CPU.

Let’s profile our application using its PID:

$ ./profiler.sh -e cpu -d 30 -o summary 66959
Started [cpu] profiling
--- Execution profile --- 
Total samples       : 28

Frame buffer usage  : 0.069%

Here, we’ve defined the cpu profiling event by using the -e option. Then, we used the -d option to collect the sample for 30 seconds.

Last, the -o option is useful to define the output format like summary, HTML, traces, SVG, and tree.

Let’s create the HTML output while CPU profiling our application:

$ ./profiler.sh -e cpu -d 30 -f cpu_profile.html 66959

Screen-Shot-2020-07-27-at-5.53.44-AM

Here, we can see the HTML output allows us to expand, collapse, and search the samples.

Additionally, async-profiler supports flame graphs out-of-the-box.

Let’s generate a flame graph by using the .svg file extension for the CPU profile of our application:

$ ./profiler.sh -e cpu -d 30 -f cpu_profile.svg 66959

Screen-Shot-2020-07-25-at-1.32.09-PM

Here, the resulting flame graph shows Java code paths in green, C++ in yellow, and system code paths in red.

5. Allocation Profiling

Similarly, we can collect samples of memory allocation without using an intrusive technique like bytecode instrumentation.

async-profiler uses the TLAB (Thread Local Allocation Buffer) based sampling technique to collect the samples of the heap allocation above the average size of TLAB.

By using the alloc event, we can enable the profiler to collect heap allocations of our profiling application:

$ ./profiler.sh -e alloc -d 30 -f alloc_profile.svg 66255

Screen-Shot-2020-07-25-at-1.37.13-PM

Here, we can see the object cloning has allocated a large part of memory, which is otherwise hard to perceive when looking at the code.

6. Wall-Clock Profiling

Also, async-profiler can sample all threads irrespective of their status – like running, sleeping, or blocked – by using the wall-clock profile.

This can prove handy when troubleshooting issues in the application start-up time.

By defining the wall event, we can configure the profiler to collect samples of all threads:

$ ./profiler.sh -e wall -t -d 30 -f wall_clock_profile.svg 66959

Screen-Shot-2020-07-26-at-12.57.34-PM

Here, we’ve used the wall-clock profiler in per-thread mode by using the -t option, which is highly recommended when profiling all threads.

Additionally, we can check all profiling events supported by our JVM by using the list option:

$ ./profiler.sh list 66959
Basic events:
  cpu
  alloc
  lock
  wall
  itimer
Java method calls:
  ClassName.methodName

7. async-profiler With IntelliJ IDEA

IntelliJ IDEA features integration with async-profiler as a profiling tool for Java.

7.1. Profiler Configurations

We can configure async-profiler in IntelliJ IDEA by selecting the Java Profiler menu option at Settings/Preferences > Build, Execution, Deployment:

Screen-Shot-2020-07-26-at-1.07.26-PM

Also, for quick usage, we can choose any predefined configuration, like the CPU Profiler and the Allocation Profiler that IntelliJ IDEA offers.

Similarly, we can copy a profiler template and edit the Agent options for specific use cases.

7.2. Profile Application Using IntelliJ IDEA

There are a few ways to analyze our application with a profiler.

For instance, we can select the application and choose Run with option:

Screen-Shot-2020-07-27-at-6.25.49-AM

Or, we can click on the toolbar and choose the Run with option:

Screen-Shot-2020-07-27-at-6.27.06-AM

Or, by choosing the Run with Profiler option under the Run menu, then selecting the <profiler configuration name>:

Screen-Shot-2020-07-27-at-6.35.23-AM

Additionally, we can see the option to Attach Profiler to Process under the Run menu. It opens a dialog that lets us choose the process to attach:

Screen-Shot-2020-07-27-at-6.42.31-AM

Once our application is profiled, we can analyze the profiling result using the Profiler tool window bar at the bottom of the IDE.

The profiling result of our application will look like:

Screen-Shot-2020-07-27-at-6.27.41-AM

It shows the thread wise results in different output formats like flame graphs, call trees, and method list.

Alternatively, we can choose the Profiler option under the View > Tool Windows menu to see the results:

Screen-Shot-2020-07-27-at-6.57.15-AM

8. Conclusion

In this article, we explored the async-profiler, along with a few profiling techniques.

First, we’ve seen how to configure the kernel when using the Linux platform, and a few recommended JVM flags to start profiling our application with to obtain accurate results.

Then, we examined various types of profiling techniques like CPU, allocation, and wall-clock.

Last, we profiled an application with async-profiler using IntelliJ IDEA.