OpenTelemetry Certified Associate

The OTCA is a CNCF-administered certification validating hands-on proficiency with OpenTelemetry instrumentation, the Collector, and the observability ecosystem.

Duration
90 minutes
Format
Multiple choice + performance-based
Passing Score
~75%
Cost
~$250 USD
Retake
1 free retake included
Validity
2 years
Delivery
Online, proctored (PSI)
Prerequisites
None (experience recommended)

Domain Weights

Instrumentation~25%
OpenTelemetry Core Concepts~20%
Observability Fundamentals~20%
Data Collection & Processing (Collector)~20%
Backends, Storage & Visualisation~15%
Key insight: Instrumentation carries the most weight. Focus extra time on manual and automatic instrumentation across multiple languages, and Collector pipeline architecture.

2-Week Study Plan

Each daily session is 2–3 hours. Week 1 builds foundational knowledge; Week 2 covers advanced topics and exam prep.

DayFocus AreaKey TopicsDeliverable
Day 1Observability FundamentalsPillars, SLOs, RED/USE/Golden SignalsNotes + flashcards
Day 2OTel Architecture & ConceptsOTEP, spec, SDKs, APIs, signals overviewDiagram + flashcards
Day 3Traces & Context PropagationSpans, attributes, W3C TraceContext, BaggageTrace diagram lab
Day 4Metrics in OTelInstruments, temporality, exemplars, OTLPMetrics lab
Day 5Logs in OTelLog model, log appenders, log bridge, severityLogs lab
Day 6Auto-InstrumentationJava agent, Python, .NET, Node.jsAuto-instr lab
Day 7Week 1 ReviewAll signals, context propagation, quizPractice test
Day 8Manual InstrumentationSDK setup, custom spans/metrics, API patternsSDK lab
Day 9OTel Collector — BasicsReceivers, processors, exporters, pipelineCollector config lab
Day 10OTel Collector — AdvancedBatch, memory, sampling, transform, routingAdvanced pipeline
Day 11Backends & ExportersJaeger, Prometheus, OTLP, Zipkin, GrafanaBackend integration
Day 12Semantic ConventionsResource, span names, attribute namespacesAnnotation exercise
Day 13Full Mock Exam + ReviewAll domains, timed practice, weak spot reviewScore analysis
Day 14Final Review & Exam DayCram sheet, exam readiness checkCertification exam
W1
Foundations & Core Signals
Days 1–7 · Observability theory, all three signal types, auto-instrumentation
D1
Observability Fundamentals
~3 hrs

Core Theory

  • The Three Pillars: Logs (what happened), Metrics (how much/how fast), Traces (through what path and how long)
  • Observability vs. Monitoring: Monitoring tells you something is wrong; observability tells you why
  • Distributed systems challenges: latency, partial failure, lack of global state, clock skew
  • SLIs, SLOs, SLAs: Indicator → Objective → Agreement. Error budget = 1 − SLO (e.g. 99.9% = ~43 min/month)
  • Cardinality: high-cardinality (user_id, request_id) vs. low-cardinality (region, status_code)
  • Push vs. Pull: Prometheus = pull (scrapes /metrics); OTLP = push (exporter sends data)

Methods to Memorise

MethodStands ForBest For
REDRate, Errors, DurationRequest-driven services (APIs, microservices)
USEUtilisation, Saturation, ErrorsResources (CPU, disk, network, memory)
Golden SignalsLatency, Traffic, Errors, SaturationGoogle SRE — service health overview
MELTMetrics, Events, Logs, TracesIndustry grouping of telemetry types

Lab Tasks

  1. Read Google SRE Book Ch.6 — Monitoring Distributed Systems (free at sre.google)
  2. Write from memory: what each pillar captures and when to use it
  3. Create 10 flashcards: SLI/SLO/SLA, error budget, cardinality, RED, USE, Golden Signals
D2
OTel Architecture & Core Concepts
~3 hrs

Core Theory

  • History: Merger of OpenCensus + OpenTracing under CNCF in 2019
  • Components: Specification, API, SDK, Collector, Instrumentation Libraries, Exporters
  • API vs SDK: API defines interfaces (stable, vendor-neutral, ships no-ops); SDK implements them (configurable, has pipelines)
  • OTLP: gRPC port 4317, HTTP port 4318 — the canonical transport protocol
  • Signal stability: Traces ✓ Stable · Metrics ✓ Stable · Logs ✓ Stable · Profiling ✗ Experimental
  • OTEP: OpenTelemetry Enhancement Proposal — how the specification evolves (like KEPs in Kubernetes)

Component Reference

ComponentRole
APILanguage-specific interfaces; no-op by default without SDK — safe for libraries to depend on
SDKImplements the API; configurable pipelines, samplers, exporters — used by applications
Instrumentation LibraryPre-built instrumentation for frameworks/libs (Flask, Spring, Express, etc.)
ExporterSends telemetry to a backend (OTLP, Jaeger, Prometheus, Zipkin, etc.)
CollectorReceives, processes, and exports telemetry — agent or gateway deployment
ResourceEntity producing telemetry: service.name, service.version, host.name, k8s.pod.name

Lab Tasks

  1. Draw the full data flow: App → API → SDK → Exporter → Collector → Backend
  2. Browse opentelemetry.io/docs — read the Concepts → Signals overview page
  3. Identify the OTLP endpoints your current Splunk environment uses
D3
Traces & Context Propagation
~3 hrs

Core Theory

  • Trace: a DAG of Spans representing a request's full path across services
  • Span fields: TraceID (128-bit, 32 hex), SpanID (64-bit, 16 hex), ParentSpanID, TraceFlags, TraceState
  • Span kinds: SERVER (inbound), CLIENT (outbound), PRODUCER, CONSUMER, INTERNAL
  • Span status: UNSET (default) · OK (explicit success) · ERROR (exception or failure) — never set OK on 4xx
  • Span Events: timestamped annotations within a span (e.g. exception events, cache hits)
  • Span Links: causal relationships to other traces — used for fan-out, message queues
  • W3C traceparent: format = 00-{traceId}-{parentSpanId}-{flags} where flags 01=sampled, 00=not sampled
  • W3C Baggage: key-value pairs propagated in-band for business context (e.g. userId, tenantId) — not sampled with traces
  • Sampling: Head sampling (decision at root span) vs. Tail sampling (full trace in Collector, requires buffering)

Trace Data Model Quick Reference

FieldExample / Description
TraceID4bf92f3577b34da6a3ce929d0e0e4736 — 128-bit, hex
SpanID00f067aa0ba902b7 — 64-bit, hex
traceparent00-4bf9...4736-00f0...02b7-01
tracestaterojo=00f067aa0ba902b7,congo=t61rcWkgMzE — vendor state
BaggageuserId=123;tenant=acme — business context, in-band
Span KindSERVER · CLIENT · PRODUCER · CONSUMER · INTERNAL
Status CodeUNSET / OK / ERROR

Lab Tasks

  1. Run the OTel Demo and view traces in Jaeger (see lab guide →)
  2. Inspect traceparent headers in Chrome DevTools → Network tab
  3. Write a pseudocode propagator inject() and extract() from memory
D4
Metrics in OpenTelemetry
~3 hrs

The 7 Instruments

InstrumentSync/AsyncUse Case
CounterSyncMonotonically increasing — total requests, bytes sent
UpDownCounterSyncCan go up or down — queue depth, active connections
HistogramSyncLatency distributions — request duration, payload size
GaugeSyncPoint-in-time snapshot — CPU %, memory bytes
ObservableCounterAsync (callback)Cumulative totals measured asynchronously
ObservableUpDownCounterAsync (callback)Bidirectional totals measured asynchronously
ObservableGaugeAsync (callback)Point-in-time async — temperature sensors, external reads

Key Concepts

  • Delta temporality: reports change since last export — cumulative value resets on process restart
  • Cumulative temporality: reports total since start — required for Prometheus compatibility
  • Exemplars: sample span references embedded in metric data points — link metrics to traces in Grafana
  • Views: customise aggregation, rename instruments, filter attributes — defined at SDK build time
  • MetricReader: PeriodicExportingMetricReader (push) vs. ManualReader vs. PrometheusMetricReader (pull)
  • Naming convention: {namespace}.{name} e.g. http.server.request.duration

Lab Tasks

  1. Instrument a Flask server with a Counter and Histogram (see lab guide →)
  2. Export metrics to Prometheus and verify the /metrics endpoint
  3. Create a View to rename an instrument and filter its attributes
D5
Logs in OpenTelemetry
~2.5 hrs

Core Theory

  • Log Data Model fields: Timestamp, ObservedTimestamp, TraceID, SpanID, TraceFlags, SeverityNumber, SeverityText, Body, Attributes, Resource
  • SeverityNumber: 1–24 scale — TRACE(1-4), DEBUG(5-8), INFO(9-12), WARN(13-16), ERROR(17-20), FATAL(21-24)
  • Log Appender: bridges existing logging frameworks (log4j, logback, slog, Winston) to OTel — this is the primary integration path for existing apps
  • Log Bridge API: LoggerProvider → Logger → LogRecord — NOT for direct use in application code
  • Correlation: TraceID/SpanID injected automatically into log records when emitted inside an active span
  • filelog receiver: Collector receiver for tailing log files — most common path for legacy/existing applications

Lab Tasks

  1. Configure Java log4j2 with OTel appender and verify TraceID correlation (see lab guide →)
  2. Set up the filelog receiver in the Collector to tail a log file with a regex parser
  3. Verify log records appear in a backend with trace correlation
D6
Automatic Instrumentation
~3 hrs

Language-Specific Agents

LanguageMechanismKey Detail
JavaJVM agent JAR-javaagent:opentelemetry-javaagent.jar at startup — zero code changes
PythonCLI wrapperopentelemetry-instrument python app.py
Node.jsrequire at startuprequire('@opentelemetry/sdk-node') + auto-instrumentations package
.NETEnv var + native profilerOTEL_DOTNET_AUTO_HOME env var; CLR profiler API
GoeBPF (experimental)Requires no source changes; still maturing

Key Environment Variables

VariablePurpose / Example
OTEL_SERVICE_NAMEService identifier: payment-service
OTEL_EXPORTER_OTLP_ENDPOINTCollector endpoint: http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOLgrpc (default) | http/protobuf | http/json
OTEL_TRACES_SAMPLERalways_on | always_off | parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARGRatio, e.g. 0.1 for 10%
OTEL_PROPAGATORStracecontext,baggage (default)
OTEL_RESOURCE_ATTRIBUTESdeployment.environment=prod,k8s.namespace.name=payments
OTEL_LOGS_EXPORTERotlp | none
OTEL_METRICS_EXPORTERotlp | prometheus | none

Lab Tasks

  1. Run the Java agent against a Spring Boot app — zero code changes (see lab guide →)
  2. Verify HTTP and JDBC spans appear in Jaeger
  3. Toggle OTEL_TRACES_SAMPLER_ARG between 1.0, 0.1, and 0 — observe the difference
D7
Week 1 Review & Practice Test
~3 hrs · No new material
Consolidation day only. Identify gaps from the practice questions below, then revisit those day cards.

Review Checklist

  • Can you explain the difference between API and SDK without notes?
  • Can you decode a traceparent header from memory?
  • Do you know all 7 metric instrument types and when to use each?
  • Can you describe head vs. tail sampling trade-offs?
  • Do you understand W3C Baggage vs. Span Attributes?
  • Can you configure auto-instrumentation for at least 2 languages via env vars only?

Practice Questions — Week 1

Q1What is the default port for OTLP/gRPC?show answer ▾

4317 (gRPC) · 4318 (HTTP/protobuf)

Q2Which span status should you set when a downstream call returns HTTP 200 with a business error inside the body?show answer ▾

UNSET — HTTP 200 is not an OTel-level error. Only set ERROR when the operation itself failed (exceptions, 5xx, etc.)

Q3A Counter is reset to zero when the process restarts. Which temporality exports zero at restart?show answer ▾

Delta temporality — it reports the change since the last export, so after a restart the first export is 0. Cumulative would continue from where it left off (or report a reset dip).

Q4What is the difference between a Span Link and a Span Event?show answer ▾

Span Link: a reference to another trace — models causal relationships like message queue fan-out. Span Event: a timestamped annotation within the current span (e.g. "cache miss", exception details).

Q5Which sampler respects the parent's sampling decision?show answer ▾

parentbased_* samplers (parentbased_always_on, parentbased_traceidratio). If the upstream already sampled the trace (flag=01 in traceparent), this service also samples it.

Q6Which OTel component provides a no-op implementation if the SDK is not installed?show answer ▾

The OTel API — it ships no-ops so instrumentation libraries can depend on it safely without forcing an SDK on downstream users.

Q7What header carries W3C Baggage information?show answer ▾

The baggage header — separate from traceparent/tracestate. Format: baggage: userId=123,tenant=acme

W2
Advanced Topics, Collector & Exam Prep
Days 8–14 · Manual instrumentation, Collector pipelines, backends, mock exam
D8
Manual Instrumentation
~3 hrs

Core Theory

  • TracerProvider: global singleton — obtain via opentelemetry.trace.get_tracer_provider()
  • Tracer: obtained from TracerProvider, scoped to instrumentation library name + version
  • Span lifecycle: start_span() → set attributes / add events → end() — always use context managers to guarantee end()
  • MeterProvider → Meter: analogous pattern to TracerProvider → Tracer
  • Context: carries active span; use Context.with_value() for async/threading propagation
  • BatchSpanProcessor vs SimpleSpanProcessor: always use Batch in production — buffers and exports in background
  • record_exception(): adds an exception event with type, message, and stacktrace; also call set_status(ERROR)

Lab Tasks

  1. Add custom spans around a business function with attributes and events (see lab guide →)
  2. Record an exception and verify span status is ERROR in Jaeger
  3. Create a custom Counter metric and verify it in Prometheus /metrics
D9
OTel Collector — Basics
~3 hrs

Collector Architecture

  • Deployment patterns: Agent (sidecar / DaemonSet), Gateway (centralised), Direct (no Collector)
  • Pipeline: Receivers → Processors → Exporters — one pipeline per signal type (traces / metrics / logs)
  • Receivers: otlp, jaeger, zipkin, prometheus, filelog, hostmetrics, k8sattributes
  • Processors: batch, memory_limiter, resourcedetection, attributes, transform, filter, tail_sampling
  • Exporters: otlp, otlphttp, jaeger, prometheus, logging (debug), file
  • Extensions: health_check, pprof, zpages — for monitoring the Collector itself
Critical for the exam: memory_limiter MUST be first in processors. batch MUST be last. This ordering prevents OOM before the limiter kicks in.

Lab Tasks

  1. Run the OTel Collector with Docker Compose (see lab guide →)
  2. Configure an OTLP receiver → batch processor → logging exporter pipeline
  3. Send a test trace with otel-cli and verify in Jaeger
D10
OTel Collector — Advanced
~3 hrs

Advanced Processors

ProcessorPurpose
attributesinsert / update / delete / hash / extract attributes on spans, metrics, logs
transformOTTL (OTel Transformation Language) expressions for complex mutations
filterDrop telemetry matching conditions — by attribute, resource, metric name
tail_samplingBuffer full traces then sample by policy: latency, status_code, probabilistic, composite
routingSend telemetry to different exporters based on attribute values

Connectors

ConnectorPurpose
spanmetricsGenerate RED metrics (calls_total, duration histogram) from trace spans
countCount telemetry items passing through as a metric
forwardPass data from one pipeline to another signal-agnostically
Tail sampling requires ALL spans of a trace to reach the same Collector instance. In multi-Collector deployments, use loadbalancingexporter to route by TraceID hash first.

Lab Tasks

  1. Configure tail_sampling with status_code + latency + composite policies (see lab guide →)
  2. Set up the spanmetrics connector and scrape generated metrics from Prometheus
  3. Use the filter processor to drop health-check spans by URL path
D11
Backends, Exporters & Visualisation
~2.5 hrs

Common Backends

BackendSignal(s)Key Notes
JaegerTracesOTLP natively (v1.35+); Jaeger exporter for legacy; UI for trace search & flame graphs
PrometheusMetricsPull model; Collector exposes /metrics; remote_write for push
GrafanaAll (via datasources)Connect Jaeger, Prometheus, Loki; exemplars link metrics → traces
ZipkinTracesZipkin exporter in Collector; legacy B3 format support
LokiLogsGrafana Loki exporter in Collector; LogQL for querying
TempoTracesGrafana Tempo; OTLP native; deep integration with Loki + Prometheus
Always prefer the OTLP exporter over native exporters — it's backend-agnostic and future-proof. Use native exporters only for legacy compatibility.

Lab Tasks

  1. Set up full stack: Jaeger + Prometheus + Grafana via Docker Compose (see lab guide →)
  2. Configure Prometheus scrape of Collector /metrics endpoint
  3. In Grafana, link metrics to traces via exemplars (requires spanmetrics connector)
D12
Semantic Conventions & Best Practices
~2.5 hrs

Key Conventions to Know

AttributeSignalExample
http.request.methodSpanGET, POST, PUT
http.response.status_codeSpan200, 404, 500
url.fullSpanhttps://api.example.com/v1/users
server.addressSpanapi.example.com
db.systemSpanpostgresql, mysql, redis
db.statementSpanSELECT * FROM users WHERE id = ?
messaging.systemSpankafka, rabbitmq, sqs
messaging.destination.nameSpanorders-topic
rpc.systemSpangrpc, jsonrpc
service.nameResourcepayment-service
service.versionResource1.2.3
deployment.environmentResourceproduction
http.server.request.durationMetricHistogram, unit: seconds

Key Deprecations (exam-relevant)

Old (deprecated)New (current)
http.methodhttp.request.method
http.status_codehttp.response.status_code
http.urlurl.full
net.peer.nameserver.address
net.peer.portserver.port

Lab Tasks

  1. Annotate 5 spans with correct semconv attributes: HTTP server, HTTP client, DB, Kafka, gRPC (see lab guide →)
  2. Review the OTel Semantic Conventions GitHub repo for HTTP and DB namespaces
  3. List 3+ deprecated attribute names and their replacements
D13
Full Mock Exam + Targeted Review
~4 hrs · Timed
Simulate real exam conditions: 90 minutes, no notes, no browser. Track your score per domain. Then review every wrong answer before Day 14.

Mock Exam — 15 Questions (All Domains)

Q1Which OTel component receives, processes, and exports telemetry independently of the application?show ▾

The OTel Collector

Q2In the Collector config, which processor must appear FIRST in the processors list?show ▾

memory_limiter — it must be first to prevent OOM conditions before batching kicks in

Q3What is the default OTLP gRPC port?show ▾

4317

Q4What is the key difference between an ObservableGauge and a Gauge?show ▾

ObservableGauge is asynchronous (callback-based, SDK calls your function at export time). Gauge is synchronous (you call record() explicitly in your code).

Q5Which propagation format does OTel use by default?show ▾

W3C TraceContext (traceparent + tracestate headers), combined with W3C Baggage

Q6What does the spanmetrics connector generate?show ▾

RED metrics from trace spans: calls_total (Counter) and request duration (Histogram), labelled by span kind, service name, and configured dimensions

Q7A span has kind=CLIENT, status=UNSET, and http.response.status_code=503. What status should it have?show ▾

ERROR — 5xx HTTP responses should cause the span status to be set to ERROR, with the status message describing the failure

Q8What does tail sampling require that head sampling does not?show ▾

All spans of a trace must flow through the same Collector instance (or be load-balanced by TraceID) so the full trace can be buffered and evaluated by policy

Q9Which Collector receiver collects logs from a file on disk?show ▾

filelog receiver (from opentelemetry-collector-contrib)

Q10What does OTEL_TRACES_SAMPLER=parentbased_traceidratio do?show ▾

Samples at the configured ratio (OTEL_TRACES_SAMPLER_ARG) for root spans, but always respects the sampling decision already set by a parent span in an incoming traceparent header

Q11Which attribute namespace covers database calls?show ▾

db.* namespace — e.g. db.system, db.statement, db.name, db.operation

Q12What is an Exemplar in OTel Metrics?show ▾

A sample span reference (trace_id + span_id) embedded in a metric data point, enabling navigation from a metric anomaly directly to a representative trace in the backend

Q13What is the difference between TraceState and Baggage?show ▾

TraceState: part of W3C traceparent spec, vendor-specific opaque values, strict size limits, travels with the trace. Baggage: application-level key-value pairs for business context (userId, tenantId), separate header, not sampled with the trace.

Q14How are Histogram bucket boundaries configured in the SDK?show ▾

Via a View with ExplicitBucketHistogramAggregation specifying the bucket boundaries array, attached to the MeterProvider at build time

Q15Which Collector extension provides a health check HTTP endpoint?show ▾

health_check extension — defaults to port 13133, returns {"status":"Server available"}

Exam Day Cram Sheet

One-page reference for the morning of your exam. Everything you need to have at the top of your mind.

⚡ Ports & Protocols
  • OTLP gRPC: 4317 · OTLP HTTP: 4318
  • Jaeger UI: 16686 · Prometheus: 9090
  • Zipkin: 9411 · Grafana: 3000
  • traceparent: 00-{32hex}-{16hex}-{flags}
  • Sampled flag: 01 = yes · 00 = no
✓ Signal Stability
  • Traces → STABLE
  • Metrics → STABLE
  • Logs → STABLE
  • Profiling → Experimental
  • OTLP → STABLE
⚠ Processor Order
  • memory_limiter → always FIRST
  • batch → always LAST
  • Wrong: [batch, memory_limiter]
  • Right: [memory_limiter, filter, batch]
📊 Instrument Cheatsheet
  • Always up → Counter
  • Up or down → UpDownCounter
  • Distribution → Histogram
  • Snapshot → Gauge
  • Async versions → prefix Observable
🔄 Propagators
  • Default: tracecontext,baggage
  • B3 single: b3
  • B3 multi-header: b3multi
  • AWS X-Ray: xray
  • Set via: OTEL_PROPAGATORS
🔴 Span Status Rules
  • UNSET → default, leave alone
  • ERROR → exceptions, 5xx, failures
  • OK → explicit success only
  • NEVER OK on 4xx (client error)
  • HTTP 200 with biz error → UNSET
📐 Semconv Namespaces
  • HTTP: http.*, url.*
  • DB: db.*
  • Messaging: messaging.*
  • RPC: rpc.*
  • Cloud/K8s: cloud.*, k8s.*
  • Resource: service.*, host.*
📋 Exam Day
  • Stable internet + working webcam
  • Government-issued photo ID
  • Clear desk, no second monitors
  • PSI browser extension installed
  • Arrive 30 min early for check-in
  • Flag & return to uncertain Qs

Study Progress

0 / 14 days completed 0%