Working With OpenTelemetry Collector | Baeldung

1. Overview

The OpenTelemetry Collector gathers, handles, and sends telemetry data about how our software is performing. It integrates with various systems and tools, allowing us to incorporate it into our existing setup.

In this tutorial, we’ll examine the core features of the OpenTelemetry Collector: receiving telemetry data, processing telemetry data, and exporting telemetry data to a datastore. We’ll also briefly explore working with extensions.

2. Setup

To demonstrate these features, we’ll use a DictionaryService, which will act as a third-party library JAR hosted on our internal machines. This DictionaryService generates an English word with its associated definition at a randomized pause interval for each request. We’ll then use a separate TriviaService that’ll initiate an HTTP call to the DictionaryService.

First, let’s install the collector.

The OpenTelemetry Collector provides multiple options for installation across various environments. For our local setup, let’s run the collector within a Docker container:

docker pull otel/opentelemetry-collector-contrib:latest

docker run -p 4317:4317 -p 4318:4318 --name telemetry-collector -d otel/opentelemetry-collector-contrib:latest

Port 4317 receives telemetry data via the gRPC protocol, while port 4318 receives telemetry data over HTTP. Let’s verify that our Collector has started correctly:

docker logs -f telemetry-collector

The default installation collects its own metrics at 10-second intervals, as shown by the previous command. Now that we have a collector running, let’s send telemetry data to it.

3. Automatic Instrumentation (Zero-Code)

Automatic instrumentation enables us to add observability without editing the library’s source code. Let’s download an agent JAR file that we’ll use to instrument our third-party DictionaryService:

curl -L -O https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

Next, let’s start our third-party standalone application, which is packaged as a JAR file named dictionary-service.jar:

java -javaagent:/path/to/file/opentelemetry-javaagent.jar \
  -Dotel.service.name=dictionary-service-application \
  -jar /path/to/library/dictionary-service-0.0.1.jar

With this command, the OpenTelemetry Java Agent will send default metrics, such as OS-related metrics, JVM metrics, and others, to the collector.

We can access our hosted dictionary service on port 8081. Let’s send a GET request to trigger the collection of additional metrics:

curl -i http://127.0.0.1:8081/api/words/random

Our collector logs should display output similar to the one below:

telemetry-collector-1  | ScopeLogs SchemaURL: 
telemetry-collector-1  | InstrumentationScope com.baeldung.DictionaryController 
telemetry-collector-1  | LogRecord #0
telemetry-collector-1  | ObservedTimestamp: 2024-10-21 12:43:03.079415993 +0000 UTC
telemetry-collector-1  | Timestamp: 2024-10-21 12:43:03.079342085 +0000 UTC
telemetry-collector-1  | SeverityText: INFO
telemetry-collector-1  | SeverityNumber: Info(9)
telemetry-collector-1  | Body: Str(Processing received request for a random word)
telemetry-collector-1  | Trace ID: d6f2fabcd2d28ebc405c0ee6965d612c
telemetry-collector-1  | Span ID: a8cc3a77f30e98a3

telemetry-collector-1  | 2024-10-21T12:43:07.480Z       info    ResourceLog #0
telemetry-collector-1  | Resource SchemaURL: https://opentelemetry.io/schemas/1.24.0
telemetry-collector-1  | Resource attributes:
telemetry-collector-1  |      -> container.id: Str(cf34dafd2008f7106c0be4effc94da302ee69b1f493acd88948e8375351028ca)
telemetry-collector-1  |      -> host.arch: Str(amd64)
telemetry-collector-1  |      -> host.name: Str(codespaces-70d510)
telemetry-collector-1  |      -> os.description: Str(Linux 6.5.0-1025-azure)
telemetry-collector-1  |      -> os.type: Str(linux)
telemetry-collector-1  |      -> process.command_args: Slice(["/usr/local/sdkman/candidates/java/17.0.13.fx-librca/bin/java","-javaagent:./opentelemetry-javaagent.jar","-Dotel.service.name=dictionary-service-application","-jar","./target/dictionary-service-0.0.1.jar"])
telemetry-collector-1  |      -> process.executable.path: Str(/usr/local/sdkman/candidates/java/17.0.13.fx-librca/bin/java)
telemetry-collector-1  |      -> process.pid: Int(19539)

Next, let’s manually instrument our consumer service.

4. Manual Instrumentation

With manual instrumentation, we can customize some of the telemetry signals that our service produces. We’ll start by adding the required dependencies:

<dependencyManagement>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-bom</artifactId>
        <version>1.41.0</version>
        <type>pom</type>
        <scope>import</scope>
    </dependency>
</dependencyManagement>

The Maven BOM ensures consistent alignment of all OpenTelemetry dependency versions.

<dependencies> 
    <dependency> 
        <groupId>io.opentelemetry</groupId> 
        <artifactId>opentelemetry-api</artifactId> 
    </dependency> 
    <dependency> 
        <groupId>io.opentelemetry</groupId> 
        <artifactId>opentelemetry-sdk</artifactId> 
    </dependency> 
    <dependency> 
        <groupId>io.opentelemetry</groupId> 
        <artifactId>opentelemetry-exporter-otlp</artifactId> 
    </dependency> 
</dependencies>

The API provides a set of interfaces for collecting telemetry, while the SDK delivers the actual implementation.

4.1. Setting up Providers

Next, we’ll instantiate a few providers that our consumer service will use to emit telemetry data to the OpenTelemetry Collector. Let’s start with the SdkTracerProvider:

public class TelemetryConfig {
    private static TelemetryConfig telemetryConfig;
    private static final String OTLP_TRACES_ENDPOINT 
      = "http://telemetry-collector:4318/v1/traces";
    //...

    private TelemetryConfig() {
        Resource resource = Resource.getDefault()
          .toBuilder()
          .put(AttributeKey.stringKey("service.name"), "trivia-service")
          .put(AttributeKey.stringKey("service.version"), "1.0-SNAPSHOT")
          .build();

        SpanExporter spanExporter = OtlpHttpSpanExporter.builder()
          .setEndpoint(OTLP_TRACES_ENDPOINT).build();
        SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
          .setResource(resource)
          .addSpanProcessor(SimpleSpanProcessor.create(spanExporter))
      .build();
        //...
    }
}

In our TelemetryConfig, we define a Resource, which describes the service. The SpanExporter sends trace data to a remote system, while the SdkTracerProvider creates the actual spans (trace events). Next we’ll instantiate the SdkMeterProvider:

public class TelemetryConfig {
    private static final String OTLP_METRICS_ENDPOINT 
      = "http://telemetry-collector:4318/v1/metrics";
    //...

    private TelemetryConfig() {
    //...
        MetricExporter metricExporter = OtlpHttpMetricExporter.builder()
          .setEndpoint(OTLP_METRICS_ENDPOINT).build();
        MetricReader metricReader = PeriodicMetricReader.builder(metricExporter)
          .setInterval(30, TimeUnit.SECONDS)
      .build();
        SdkMeterProvider meterProvider = SdkMeterProvider.builder()
          .setResource(resource)
          .registerMetricReader(metricReader)
      .build();
        //...
    }
}

The MetricExporter sends metric data to an endpoint, and the SdkMeterProvider takes care of how we gather and read those metrics regularly. And lastly, we’ll introduce the SdkLoggerProvider:

public class TelemetryConfig {
    private static final String OTLP_LOGS_ENDPOINT 
      = "http://telemetry-collector:4317";
    //...

    private TelemetryConfig() {
    //...
        LogRecordExporter logRecordExporter = OtlpGrpcLogRecordExporter.builder()
          .setEndpoint(OTLP_LOGS_ENDPOINT).build();
        LogRecordProcessor logRecordProcessor 
          = BatchLogRecordProcessor.builder(logRecordExporter).build();
        SdkLoggerProvider sdkLoggerProvider = SdkLoggerProvider.builder()
          .setResource(resource)
          .addLogRecordProcessor(logRecordProcessor).build();
    //...
    }
}

The SdkLoggerProvider sets up the loggers that generate logs, and once they’re processed, the LogRecordExporter sends them to an endpoint. Then we put it all together:

public class TelemetryConfig {
    //...
    private final OpenTelemetry openTelemetry;

    private TelemetryConfig() {
    //...
        openTelemetry = OpenTelemetrySdk.builder()
      .setMeterProvider(meterProvider)
      .setTracerProvider(tracerProvider)
          .setLoggerProvider(sdkLoggerProvider)
      .setPropagators(
            ContextPropagators.create(
              TextMapPropagator.composite(W3CTraceContextPropagator.getInstance(),
                W3CBaggagePropagator.getInstance())
            )
          )
      .buildAndRegisterGlobal();
    //...
    }
}

ContextPropagators enable us to share contextual data between services, and our code uses W3CTraceContextPropagator and W3CBaggagePropagator to adhere to industry standards. Here, we configure the openTelemetry object with all components and register it globally for easy access throughout the system.

4.2. Using the Provider

Next, let’s make a call to our hosted DictionaryService and capture a few metrics:

@Path("/trivia")
public class TriviaResource {
    private final Tracer tracer;
    private final Meter meter;
    private final LongCounter httpRequestCounter;

    private TriviaService triviaService;

    static final String OTEL_SERVICE_NAME = "trivia-service";
    static final String WORD_SERVICE_URL 
      = "http://localhost:8081/api/words/random";

    public TriviaResource() {
        this.triviaService = new TriviaService(new OkHttpClient());
        
        TelemetryConfig telemetryConfig = TelemetryConfig.getInstance();
        this.tracer = telemetryConfig.getOpenTelemetry()
          .getTracer(OTEL_SERVICE_NAME, "0.0.1-SNAPSHOT");
        this.meter = telemetryConfig.getOpenTelemetry()
          .getMeter(OTEL_SERVICE_NAME);

        this.httpRequestCounter = meter.counterBuilder("http.request.count")
          .setDescription("Counts the number of HTTP requests")
          .setUnit("1")
          .build();
    }

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    public Response retreiveCard() {
        httpRequestCounter.add(1, Attributes.builder().put("endpoint", "/trivia")
          .build());

        Span span = tracer.spanBuilder("retreive_card")
          .setAttribute("http.method", "GET")
          .setAttribute("http.url", WORD_SERVICE_URL)
          .setSpanKind(SpanKind.CLIENT).startSpan();

        try (Scope scope = span.makeCurrent()) {
            WordResponse wordResponse 
              = triviaService.requestWordFromSource(WORD_SERVICE_URL);

            span.setAttribute("http.status_code", wordResponse.httpResponseCode());
            return Response.ok(wordResponse.wordWithDefinition()).build();
        } catch(IOException exception) {
            span.setStatus(
              StatusCode.ERROR, "Error retreiving info from dictionary service"
            );
            span.recordException(exception);
            return Response.noContent().build();
        } finally {
            span.end();
        }
    }
}

In our retrieveCard method, we use tracer.spanBuilder to create and start a Span, which tracks the whole lifecycle of an operation, capturing its duration and any errors along the way. Additionally, we add the external service response code as an attribute on the Span for more information.

Our TriviaResource will record telemetry data each time it receives a request. Let’s run it in a Docker container:

docker build -t trivia-webservice .

docker run -p 8080:8080 --name trivia-webservice -d -t trivia-webservice:latest

Then, when we issue a request to our service endpoint, it will return a random word with its definition:

curl http://127.0.0.1:8080/trivia-webservice/api/trivia

Now that we have the setup in place, let’s turn our attention back to the collector.

5. Collector Pipeline

Earlier, when we started the OpenTelemetry Collector with Docker, we didn’t specify any configuration, so the collector used the default settings. We can change this behavior by specifying a YAML configuration file. Let’s create a collector-config.yaml file, and in the next sub-section, we’ll specify the receivers.

5.1. Receivers

As the name suggests, receivers are responsible for accepting telemetry data from various sources. Let’s edit the collector-config.yaml file to reflect this:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  hostmetrics:
    collection_interval: 60s
      scrapers:
        cpu:
        memory:
        load:
# ...
service:
  pipelines:
    traces:
    metrics:
      receivers: [hostmetrics, otlp]
    logs:

This configuration specifies the address and port where the receiver listens for incoming OTLP data. Additionally, the Host Metrics Receiver collects and logs metrics about the host system when we deploy the OpenTelemetry Collector as an agent on the host.

Next, let’s proceed with modifying the processor function.

5.2. Processors

The OpenTelemetry Collector processor function allows us to sanitize telemetry data before exporting it to storage or analysis tools. Let’s adjust our collector-config.yaml file to incorporate this feature:

processors:
  batch:
  attributes/remove_client_address:
    actions:
      - key: client.address
        action: delete
# ...
service:
  pipelines:
    traces:
      receivers:
      processors: [batch, attributes/remove_client_address]
      exporters:

With this configuration, we’ll remove the client.address span attribute from the received telemetry payload.

Additionally, let’s configure a Memory Limiter Processor to help our Collector avoid out-of-memory situations:

processors:
#...
  memory_limiter:
    check_interval: 1s
    limit_mib: 6000    
    spike_limit_mib: 1200
#...
service:
  pipelines:
    traces:
      processors: [memory_limiter, batch, attributes/client_address]
    metrics:
      processors: [memory_limiter]
    logs:
      processors: [memory_limiter, batch]

With this configuration, our Collector will measure memory usage every second and apply a soft limit when memory utilization reaches 4.8GB in this instance. We place the memory_limiter processor first in the pipelines to ensure that backpressure is applied to receivers and to reduce the risk of data loss.

Next, let’s offload the telemetry data using exporters.

5.3. Exporters

OpenTelemetry Collector exporters facilitate the transmission of collected data to various backends for analysis and visualization. Let’s update our collector-config.yaml file with instructions on where to send our telemetry data:

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlp:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
# ...
service:
  pipelines:
    traces:
      exporters: [otlp]
    metrics:
      exporters: [prometheus]

With this configuration, our Collector sends traces straight to a Jaeger backend, providing us with a user-friendly interface to view and analyze those traces. Plus, Prometheus will track the health and performance of our systems over time by actively scraping our metrics.

After making these updates, we need to restart the Collector Docker container and specify the YAML configuration file:

docker run -p 4317:4317 -p 4318:4318 -p 8889:8889 -v /path/to/file/collector-config.yaml:/etc/collector-config.yaml --name telemetry-collector -d otel/opentelemetry-collector-contrib:latest

Optionally, we can define a docker-compose.yaml file so we can start our services using Docker Compose:

---
services:
  jaeger-all-in-one:
    image: jaegertracing/all-in-one:latest
    container_name: jaeger
    ports:
      - 16686:16686
    networks:
      - otel-network

  telemetry-collector:
    image: otel/opentelemetry-collector:latest
    volumes:
      - ./collector-config.yaml:/etc/collector-config.yaml
    command:
      - --config=/etc/collector-config.yaml
    ports:
      - 4317:4317
      - 4318:4318
      - 8889:8889
      - 55679:55679
    networks:
      - otel-network

  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - "./prometheus.yml:/etc/prometheus/prometheus.yml"
    ports:
      - 9090:9090    
    networks:
      - otel-network        

  web:
    image: trivia-webservice:latest
    ports:
      - 8080:8080
    networks:
      - otel-network      

networks:
  otel-network:
    driver: bridge

With our docker-compose.yaml file in place, we can start our services using the Docker Compose command:

docker compose up -d

Once we’ve called the service a few times, we can check Prometheus to see how many requests it’s received:

curl 'http://127.0.0.1:9090/api/v1/query?query=http_request_count_total{endpoint="/trivia"}'

The query should give us a result that looks like:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "http_request_count_total",
          "endpoint": "/trivia",
          "exported_job": "trivia-service",
          "instance": "telemetry-collector:8889",
          "job": "otel-collector"
        },
        "value": [
          1729545261.56,
          "8"
        ]
      }
    ]
  }
}

The http_request_count_total metric for the /trivia endpoint shows a value of 8.

We can also check Jaeger by using the service name for a closer look at the requests:

curl -G 'http://127.0.0.1:16686/api/traces' --data-urlencode 'service=trivia-service'

The call should give us a result like:

{
  "data": [
    {
      "traceID": "fa2aa25585dc6c217ddba732455d7583",
      "spans": [
        {
          "traceID": "fa2aa25585dc6c217ddba732455d7583",
          "spanID": "92628f88adb16011",
          "operationName": "retreive_card",
          "references": [],
          "startTime": 1729594621616443,
          "duration": 1565913,
          "tags": [
            {
              "key": "http.method",
              "type": "string",
              "value": "GET"
            },
            {
              "key": "http.status_code",
              "type": "int64",
              "value": 200
            },
            {
              "key": "http.url",
              "type": "string",
              "value": "http://127.0.0.1:8081/api/words/random"
            },
            {
              "key": "internal.span.format",
              "type": "string",
              "value": "otlp"
            },
            {
              "key": "span.kind",
              "type": "string",
              "value": "client"
            }
          ],
          "logs": [
            {
              "timestamp": 1729594621617640,
              "fields": [
                {
                  "key": "event",
                  "type": "string",
                  "value": "http.request.word-api"
                }
              ]
            }
          ],
          "processID": "p1",
          "warnings": null
        }
      ],
      "warnings": null
    }
  ],
  "errors": null
}

From this output, we can see which operation the request performed and how long it took, for example.

6. Operational Enhancements

As we’ve seen so far, the OpenTelemetry Collector is highly configurable. Extensions – specialized add-ons that we can attach to the Collector – provide additional capabilities and customizable behavior. Let’s configure a zPages Extension to see how this works in practice.

6.1. Extension

zPages help us monitor the performance of the OpenTelemetry Collector and diagnose issues while it’s running. To attach the zPages extension to our Collector, let’s update the collector-config.yaml file with the extensions configuration:

#...
extensions:
  zpages:
    endpoint: 0.0.0.0:55679

service:
  pipelines:
#...
  extensions: [zpages]

With this setting, we specify an HTTP endpoint that serves zPages, and the collector exposes zPage routes from the specified port. After restarting our Collector, we can view the zPage routes in a browser or use the command below for a text-based browser:

curl http://localhost:55679/debug/tracez | lynx --stdin

We can navigate to the TraceZ route, which lets us check and categorize spans by their latency. The extension’s README file lists the exposed routes.

7. Conclusion

In this article, we examined the core features of the OpenTelemetry Collector. We saw how we can utilize automatic and manual instrumentation to send telemetry data to our collector. We also considered how we can customize how our collector receives, processes, and transmits collected data. Lastly, we looked at how we can attach additional capabilities to the collector using extensions.

As always, the source code for all the examples can be found over on GitHub.

Persistence

REST

Security