1. Introduction

Micrometer provides a simple facade over the instrumentation clients for a number of popular monitoring systems. Currently, it supports the following monitoring systems: Atlas, Datadog, Graphite, Ganglia, Influx, JMX, and Prometheus.

In this tutorial, we’ll introduce the basic usage of Micrometer and its integration with Spring.

For the sake of simplicity, we’ll take Micrometer Atlas as an example to demonstrate most of our use cases.

2. Maven Dependency

To start with, let’s add the following dependency to the pom.xml:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-atlas</artifactId>
    <version>1.11.0</version>
</dependency>

The latest version can be found here.

3. MeterRegistry

In Micrometer, a MeterRegistry is the core component used for registering meters. We can iterate over the registry and further each meter’s metrics to generate a time series in the backend with combinations of metrics and their dimension values.

The simplest form of the registry is SimpleMeterRegistry. But, in most cases, we should use a MeterRegistry explicitly designed for our monitoring system; for Atlas, it’s AtlasMeterRegistry.

CompositeMeterRegistry allows multiple registries to be added. It provides a solution to publish application metrics to various supported monitoring systems simultaneously.

We can add any MeterRegistry needed to upload the data to multiple platforms:

CompositeMeterRegistry compositeRegistry = new CompositeMeterRegistry();
SimpleMeterRegistry oneSimpleMeter = new SimpleMeterRegistry();
AtlasMeterRegistry atlasMeterRegistry 
  = new AtlasMeterRegistry(atlasConfig, Clock.SYSTEM);

compositeRegistry.add(oneSimpleMeter);
compositeRegistry.add(atlasMeterRegistry);

There’s static global registry support in Micrometer, Metrics.globalRegistry. Also, a set of static builders based on this global registry is provided to generate meters in Metrics:

@Test
public void givenGlobalRegistry_whenIncrementAnywhere_thenCounted() {
    class CountedObject {
        private CountedObject() {
            Metrics.counter("objects.instance").increment(1.0);
        }
    }
    Metrics.addRegistry(new SimpleMeterRegistry());

    Metrics.counter("objects.instance").increment();
    new CountedObject();

    Optional<Counter> counterOptional = Optional.ofNullable(Metrics.globalRegistry
      .find("objects.instance").counter());
    assertTrue(counterOptional.isPresent());
    assertTrue(counterOptional.get().count() == 2.0);
}

4. Tags and Meters

4.1. Tags

An identifier of a Meter consists of a name and tags. We should follow a naming convention that separates words with a dot, to help guarantee the portability of metric names across multiple monitoring systems.

Counter counter = registry.counter("page.visitors", "age", "20s");

Tags can be used for slicing the metric for reasoning about the values. In the code above, page.visitors is the name of the meter, with age=20s as its tag. In this case, the counter is counting the visitors to the page with ages between 20 and 30.

For a large system, we can append common tags to a registry. For instance, say the metrics are from a specific region:

registry.config().commonTags("region", "ua-east");

4.2. Counter

A Counter merely reports a count over a specified property of an application. We can build a custom counter with the fluent builder or the helper method of any MetricRegistry:

Counter counter = Counter
  .builder("instance")
  .description("indicates instance count of the object")
  .tags("dev", "performance")
  .register(registry);

counter.increment(2.0);
 
assertTrue(counter.count() == 2);
 
counter.increment(-1);
 
assertTrue(counter.count() == 1);

As seen in the snippet above, we tried to decrease the counter by one, but we can only increment the counter monotonically by a fixed positive amount.

4.3. Timers

To measure latencies or frequency of events in our system, we can use Timers. A Timer will report at least the total time and events count of a specific time series.

For example, we can record an application event that may last several seconds:

SimpleMeterRegistry registry = new SimpleMeterRegistry();
Timer timer = registry.timer("app.event");
timer.record(() -> {
    try {
        TimeUnit.MILLISECONDS.sleep(15);
    } catch (InterruptedException ignored) {
    }
    });

timer.record(30, TimeUnit.MILLISECONDS);

assertTrue(2 == timer.count());
assertThat(timer.totalTime(TimeUnit.MILLISECONDS)).isBetween(40.0, 55.0);

To record long time running events, we use LongTaskTimer:

SimpleMeterRegistry registry = new SimpleMeterRegistry();
LongTaskTimer longTaskTimer = LongTaskTimer
  .builder("3rdPartyService")
  .register(registry);

LongTaskTimer.Sample currentTaskId = longTaskTimer.start();
try {
    TimeUnit.MILLISECONDS.sleep(2);
} catch (InterruptedException ignored) { }
long timeElapsed = currentTaskId.stop();
 
 assertEquals(2L, timeElapsed/((int) 1e6),1L);

4.4. Gauge

A gauge shows the current value of a meter.

Different from other meters, Gauges should only report data when observed. Gauges can be useful when monitoring stats of cache or collections:

SimpleMeterRegistry registry = new SimpleMeterRegistry();
List<String> list = new ArrayList<>(4);

Gauge gauge = Gauge
  .builder("cache.size", list, List::size)
  .register(registry);

assertTrue(gauge.value() == 0.0);
 
list.add("1");
 
assertTrue(gauge.value() == 1.0);

4.5. DistributionSummary

Distribution of events and a simple summary are provided by DistributionSummary:

SimpleMeterRegistry registry = new SimpleMeterRegistry();
DistributionSummary distributionSummary = DistributionSummary
  .builder("request.size")
  .baseUnit("bytes")
  .register(registry);

distributionSummary.record(3);
distributionSummary.record(4);
distributionSummary.record(5);

assertTrue(3 == distributionSummary.count());
assertTrue(12 == distributionSummary.totalAmount());

Moreover, DistributionSummary and Timers can be enriched by percentiles:

SimpleMeterRegistry registry = new SimpleMeterRegistry();
Timer timer = Timer
  .builder("test.timer")
  .publishPercentiles(0.3, 0.5, 0.95)
  .publishPercentileHistogram()
  .register(registry);

Now, in the snippet above, three gauges with the tags percentile=0.3, percentile**=0.5, and percentile**=0.95 will be available in the registry, indicating the values below which 95%, 50%, and 30% of observations fall, respectively.

So to see these percentiles in action, let’s add some records:

timer.record(2, TimeUnit.SECONDS);
timer.record(2, TimeUnit.SECONDS);
timer.record(3, TimeUnit.SECONDS);
timer.record(4, TimeUnit.SECONDS);
timer.record(8, TimeUnit.SECONDS);
timer.record(13, TimeUnit.SECONDS);

Then we can verify by extracting values in those three percentile Gauges:

Map<Double, Double> actualMicrometer = new TreeMap<>();
ValueAtPercentile[] percentiles = timer.takeSnapshot().percentileValues();
for (ValueAtPercentile percentile : percentiles) {
    actualMicrometer.put(percentile.percentile(), percentile.value(TimeUnit.MILLISECONDS));
}

Map<Double, Double> expectedMicrometer = new TreeMap<>();
expectedMicrometer.put(0.3, 1946.157056);
expectedMicrometer.put(0.5, 3019.89888);
expectedMicrometer.put(0.95, 13354.663936);

assertEquals(expectedMicrometer, actualMicrometer);

Additionally, Micrometer also supports service-level objective (histogram):

DistributionSummary hist = DistributionSummary
  .builder("summary")
  .serviceLevelObjectives(1, 10, 5)
  .register(registry);

Similar to percentiles, after appending several records, we can see that histogram handles the computation pretty well:

Map<Integer, Double> actualMicrometer = new TreeMap<>();
HistogramSnapshot snapshot = hist.takeSnapshot();
Arrays.stream(snapshot.histogramCounts()).forEach(p -> {
  actualMicrometer.put((Integer.valueOf((int) p.bucket())), p.count());
});

Map<Integer, Double> expectedMicrometer = new TreeMap<>();
expectedMicrometer.put(1,0D);
expectedMicrometer.put(10,2D);
expectedMicrometer.put(5,1D);

assertEquals(expectedMicrometer, actualMicrometer);

Generally, histograms can help illustrate a direct comparison in separate buckets. Histograms can also be time-scaled, which is quite useful for analyzing backend service response time:

Duration[] durations = {Duration.ofMillis(25), Duration.ofMillis(300), Duration.ofMillis(600)};
Timer timer = Timer
  .builder("timer")
  .sla(durations)
  .publishPercentileHistogram()
  .register(registry);

5. Binders

The Micrometer has multiple built-in binders to monitor the JVM, caches, ExecutorService, and logging services.

When it comes to JVM and system monitoring, we can monitor class loader metrics (ClassLoaderMetrics), JVM memory pool (JvmMemoryMetrics) and GC metrics (JvmGcMetrics), and thread and CPU utilization (JvmThreadMetrics, ProcessorMetrics).

Cache monitoring (currently, only Guava, EhCache, Hazelcast, and Caffeine are supported) is supported by instrumenting with GuavaCacheMetrics, EhCache2Metrics, HazelcastCacheMetrics, and CaffeineCacheMetrics. And to monitor log back service, we can bind LogbackMetrics to any valid registry:

new LogbackMetrics().bind(registry);

The use of the above binders are quite similar to LogbackMetrics, and are all rather simple, so we won’t dive into further details here.

6. Spring Integration

The Spring Boot Actuator provides dependency management and auto-configuration for Micrometer. Now it’s supported in Spring Boot 2.0/1.x and Spring Framework 5.0/4.x.

We’ll need the following dependency (the latest version can be found here):

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-spring-legacy</artifactId>
    <version>1.3.20</version>
</dependency>

Without any further changes to the existing code, we’ve enabled Spring support with the Micrometer. JVM memory metrics of our Spring application will be automatically registered in the global registry and published to the default atlas endpoint: http://localhost:7101/api/v1/publish.

There are several configurable properties available to control metrics exporting behaviors, starting with spring.metrics.atlas.*. Check AtlasConfig to see a full list of configuration properties for Atlas publishing.

If we need to bind more metrics, only add them as @Bean to the application context.

Say we need the JvmThreadMetrics:

@Bean
JvmThreadMetrics threadMetrics(){
    return new JvmThreadMetrics();
}

As for web monitoring, it’s auto-configured for every endpoint in our application, yet manageable via a configuration property, spring.metrics.web.autoTimeServerRequests.

The default implementation provides four dimensions of metrics for endpoints: HTTP request method, HTTP response code, endpoint URI, and exception information.

When requests are responded, metrics relating to the request method (GET, POST, etc.) will be published in Atlas.

With Atlas Graph API, we can generate a graph to compare the response time for different methods:

methods

By default, response codes of 20x, 30x, 40x, 50x will also be reported:

status

We can also compare different URIs :

uri

Or check exception metrics:

exception

Note that we can also use @Timed on the controller class or specific endpoint methods to customize tags, long task, quantiles, and percentiles of the metrics:

@RestController
@Timed("people")
public class PeopleController {

    @GetMapping("/people")
    @Timed(value = "people.all", longTask = true)
    public List<String> listPeople() {
        //...
    }

}

Based on the code above, we can see the following tags by checking Atlas endpoint http://localhost:7101/api/v1/tags/name:

["people", "people.all", "jvmBufferCount", ... ]

Micrometer also works in the function web framework introduced in Spring Boot 2.0. We can enable metrics by filtering the RouterFunction:

RouterFunctionMetrics metrics = new RouterFunctionMetrics(registry);
RouterFunctions.route(...)
  .filter(metrics.timer("server.requests"));

We can also collect metrics from the data source and scheduled tasks. Check the official documentation for more details.

7. Conclusion

In this article, we introduced the metrics facade Micrometer. By abstracting away and supporting multiple monitoring systems under common semantics, the tool makes switching between different monitoring platforms quite easy.

As always, the full implementation code of this article can be found over on Github.