OTCA 2-Week Study Guide

About the exam

OpenTelemetry Certified Associate

The OTCA is a CNCF-administered certification validating hands-on proficiency with OpenTelemetry instrumentation, the Collector, and the observability ecosystem.

Duration

90 minutes

Format

Multiple choice + performance-based

Passing Score

~75%

Cost

~$250 USD

Retake

1 free retake included

Validity

2 years

Delivery

Online, proctored (PSI)

Prerequisites

None (experience recommended)

Domain Weights

Instrumentation~25%

OpenTelemetry Core Concepts~20%

Observability Fundamentals~20%

Data Collection & Processing (Collector)~20%

Backends, Storage & Visualisation~15%

ℹKey insight: Instrumentation carries the most weight. Focus extra time on manual and automatic instrumentation across multiple languages, and Collector pipeline architecture.

Overview

2-Week Study Plan

Each daily session is 2–3 hours. Week 1 builds foundational knowledge; Week 2 covers advanced topics and exam prep.

Day	Focus Area	Key Topics	Deliverable
Day 1	Observability Fundamentals	Pillars, SLOs, RED/USE/Golden Signals	Notes + flashcards
Day 2	OTel Architecture & Concepts	OTEP, spec, SDKs, APIs, signals overview	Diagram + flashcards
Day 3	Traces & Context Propagation	Spans, attributes, W3C TraceContext, Baggage	Trace diagram lab
Day 4	Metrics in OTel	Instruments, temporality, exemplars, OTLP	Metrics lab
Day 5	Logs in OTel	Log model, log appenders, log bridge, severity	Logs lab
Day 6	Auto-Instrumentation	Java agent, Python, .NET, Node.js	Auto-instr lab
Day 7	Week 1 Review	All signals, context propagation, quiz	Practice test
Day 8	Manual Instrumentation	SDK setup, custom spans/metrics, API patterns	SDK lab
Day 9	OTel Collector — Basics	Receivers, processors, exporters, pipeline	Collector config lab
Day 10	OTel Collector — Advanced	Batch, memory, sampling, transform, routing	Advanced pipeline
Day 11	Backends & Exporters	Jaeger, Prometheus, OTLP, Zipkin, Grafana	Backend integration
Day 12	Semantic Conventions	Resource, span names, attribute namespaces	Annotation exercise
Day 13	Full Mock Exam + Review	All domains, timed practice, weak spot review	Score analysis
Day 14	Final Review & Exam Day	Cram sheet, exam readiness check	Certification exam

Observability Fundamentals

~3 hrs

▾

Core Theory

The Three Pillars: Logs (what happened), Metrics (how much/how fast), Traces (through what path and how long)
Observability vs. Monitoring: Monitoring tells you something is wrong; observability tells you why
Distributed systems challenges: latency, partial failure, lack of global state, clock skew
SLIs, SLOs, SLAs: Indicator → Objective → Agreement. Error budget = 1 − SLO (e.g. 99.9% = ~43 min/month)
Cardinality: high-cardinality (user_id, request_id) vs. low-cardinality (region, status_code)
Push vs. Pull: Prometheus = pull (scrapes /metrics); OTLP = push (exporter sends data)

Methods to Memorise

Method	Stands For	Best For
RED	Rate, Errors, Duration	Request-driven services (APIs, microservices)
USE	Utilisation, Saturation, Errors	Resources (CPU, disk, network, memory)
Golden Signals	Latency, Traffic, Errors, Saturation	Google SRE — service health overview
MELT	Metrics, Events, Logs, Traces	Industry grouping of telemetry types

Lab Tasks

Read Google SRE Book Ch.6 — Monitoring Distributed Systems (free at sre.google)
Write from memory: what each pillar captures and when to use it
Create 10 flashcards: SLI/SLO/SLA, error budget, cardinality, RED, USE, Golden Signals

OTel Architecture & Core Concepts

~3 hrs

▾

Core Theory

History: Merger of OpenCensus + OpenTracing under CNCF in 2019
Components: Specification, API, SDK, Collector, Instrumentation Libraries, Exporters
API vs SDK: API defines interfaces (stable, vendor-neutral, ships no-ops); SDK implements them (configurable, has pipelines)
OTLP: gRPC port 4317, HTTP port 4318 — the canonical transport protocol
Signal stability: Traces ✓ Stable · Metrics ✓ Stable · Logs ✓ Stable · Profiling ✗ Experimental
OTEP: OpenTelemetry Enhancement Proposal — how the specification evolves (like KEPs in Kubernetes)

Component Reference

Component	Role
API	Language-specific interfaces; no-op by default without SDK — safe for libraries to depend on
SDK	Implements the API; configurable pipelines, samplers, exporters — used by applications
Instrumentation Library	Pre-built instrumentation for frameworks/libs (Flask, Spring, Express, etc.)
Exporter	Sends telemetry to a backend (OTLP, Jaeger, Prometheus, Zipkin, etc.)
Collector	Receives, processes, and exports telemetry — agent or gateway deployment
Resource	Entity producing telemetry: service.name, service.version, host.name, k8s.pod.name

Lab Tasks

Draw the full data flow: App → API → SDK → Exporter → Collector → Backend
Browse opentelemetry.io/docs — read the Concepts → Signals overview page
Identify the OTLP endpoints your current Splunk environment uses

Traces & Context Propagation

~3 hrs

▾

Core Theory

Trace: a DAG of Spans representing a request's full path across services
Span fields: TraceID (128-bit, 32 hex), SpanID (64-bit, 16 hex), ParentSpanID, TraceFlags, TraceState
Span kinds: SERVER (inbound), CLIENT (outbound), PRODUCER, CONSUMER, INTERNAL
Span status: UNSET (default) · OK (explicit success) · ERROR (exception or failure) — never set OK on 4xx
Span Events: timestamped annotations within a span (e.g. exception events, cache hits)
Span Links: causal relationships to other traces — used for fan-out, message queues
W3C traceparent: format = 00-{traceId}-{parentSpanId}-{flags} where flags 01=sampled, 00=not sampled
W3C Baggage: key-value pairs propagated in-band for business context (e.g. userId, tenantId) — not sampled with traces
Sampling: Head sampling (decision at root span) vs. Tail sampling (full trace in Collector, requires buffering)

Trace Data Model Quick Reference

Field	Example / Description
TraceID	`4bf92f3577b34da6a3ce929d0e0e4736` — 128-bit, hex
SpanID	`00f067aa0ba902b7` — 64-bit, hex
traceparent	`00-4bf9...4736-00f0...02b7-01`
tracestate	`rojo=00f067aa0ba902b7,congo=t61rcWkgMzE` — vendor state
Baggage	`userId=123;tenant=acme` — business context, in-band
Span Kind	SERVER · CLIENT · PRODUCER · CONSUMER · INTERNAL
Status Code	UNSET / OK / ERROR

Lab Tasks

Run the OTel Demo and view traces in Jaeger (see lab guide →)
Inspect traceparent headers in Chrome DevTools → Network tab
Write a pseudocode propagator inject() and extract() from memory

Metrics in OpenTelemetry

~3 hrs

▾

The 7 Instruments

Instrument	Sync/Async	Use Case
Counter	Sync	Monotonically increasing — total requests, bytes sent
UpDownCounter	Sync	Can go up or down — queue depth, active connections
Histogram	Sync	Latency distributions — request duration, payload size
Gauge	Sync	Point-in-time snapshot — CPU %, memory bytes
ObservableCounter	Async (callback)	Cumulative totals measured asynchronously
ObservableUpDownCounter	Async (callback)	Bidirectional totals measured asynchronously
ObservableGauge	Async (callback)	Point-in-time async — temperature sensors, external reads

Key Concepts

Delta temporality: reports change since last export — cumulative value resets on process restart
Cumulative temporality: reports total since start — required for Prometheus compatibility
Exemplars: sample span references embedded in metric data points — link metrics to traces in Grafana
Views: customise aggregation, rename instruments, filter attributes — defined at SDK build time
MetricReader: PeriodicExportingMetricReader (push) vs. ManualReader vs. PrometheusMetricReader (pull)
Naming convention: {namespace}.{name} e.g. http.server.request.duration

Lab Tasks

Instrument a Flask server with a Counter and Histogram (see lab guide →)
Export metrics to Prometheus and verify the /metrics endpoint
Create a View to rename an instrument and filter its attributes

Logs in OpenTelemetry

~2.5 hrs

▾

Core Theory

Log Data Model fields: Timestamp, ObservedTimestamp, TraceID, SpanID, TraceFlags, SeverityNumber, SeverityText, Body, Attributes, Resource
SeverityNumber: 1–24 scale — TRACE(1-4), DEBUG(5-8), INFO(9-12), WARN(13-16), ERROR(17-20), FATAL(21-24)
Log Appender: bridges existing logging frameworks (log4j, logback, slog, Winston) to OTel — this is the primary integration path for existing apps
Log Bridge API: LoggerProvider → Logger → LogRecord — NOT for direct use in application code
Correlation: TraceID/SpanID injected automatically into log records when emitted inside an active span
filelog receiver: Collector receiver for tailing log files — most common path for legacy/existing applications

Lab Tasks

Configure Java log4j2 with OTel appender and verify TraceID correlation (see lab guide →)
Set up the filelog receiver in the Collector to tail a log file with a regex parser
Verify log records appear in a backend with trace correlation

Automatic Instrumentation

~3 hrs

▾

Language-Specific Agents

Language	Mechanism	Key Detail
Java	JVM agent JAR	`-javaagent:opentelemetry-javaagent.jar` at startup — zero code changes
Python	CLI wrapper	`opentelemetry-instrument python app.py`
Node.js	require at startup	`require('@opentelemetry/sdk-node')` + auto-instrumentations package
.NET	Env var + native profiler	`OTEL_DOTNET_AUTO_HOME` env var; CLR profiler API
Go	eBPF (experimental)	Requires no source changes; still maturing

Key Environment Variables

Variable	Purpose / Example
OTEL_SERVICE_NAME	Service identifier: `payment-service`
OTEL_EXPORTER_OTLP_ENDPOINT	Collector endpoint: `http://otel-collector:4317`
OTEL_EXPORTER_OTLP_PROTOCOL	`grpc` (default) \| `http/protobuf` \| `http/json`
OTEL_TRACES_SAMPLER	`always_on` \| `always_off` \| `parentbased_traceidratio`
OTEL_TRACES_SAMPLER_ARG	Ratio, e.g. `0.1` for 10%
OTEL_PROPAGATORS	`tracecontext,baggage` (default)
OTEL_RESOURCE_ATTRIBUTES	`deployment.environment=prod,k8s.namespace.name=payments`
OTEL_LOGS_EXPORTER	`otlp` \| `none`
OTEL_METRICS_EXPORTER	`otlp` \| `prometheus` \| `none`

Lab Tasks

Run the Java agent against a Spring Boot app — zero code changes (see lab guide →)
Verify HTTP and JDBC spans appear in Jaeger
Toggle OTEL_TRACES_SAMPLER_ARG between 1.0, 0.1, and 0 — observe the difference

Week 1 Review & Practice Test

~3 hrs · No new material

▾

★Consolidation day only. Identify gaps from the practice questions below, then revisit those day cards.

Review Checklist

Can you explain the difference between API and SDK without notes?
Can you decode a traceparent header from memory?
Do you know all 7 metric instrument types and when to use each?
Can you describe head vs. tail sampling trade-offs?
Do you understand W3C Baggage vs. Span Attributes?
Can you configure auto-instrumentation for at least 2 languages via env vars only?

Practice Questions — Week 1

Q1What is the default port for OTLP/gRPC?show answer ▾

4317 (gRPC) · 4318 (HTTP/protobuf)

Q2Which span status should you set when a downstream call returns HTTP 200 with a business error inside the body?show answer ▾

UNSET — HTTP 200 is not an OTel-level error. Only set ERROR when the operation itself failed (exceptions, 5xx, etc.)

Q3A Counter is reset to zero when the process restarts. Which temporality exports zero at restart?show answer ▾

Delta temporality — it reports the change since the last export, so after a restart the first export is 0. Cumulative would continue from where it left off (or report a reset dip).

Q4What is the difference between a Span Link and a Span Event?show answer ▾

Span Link: a reference to another trace — models causal relationships like message queue fan-out. Span Event: a timestamped annotation within the current span (e.g. "cache miss", exception details).

Q5Which sampler respects the parent's sampling decision?show answer ▾

parentbased_* samplers (parentbased_always_on, parentbased_traceidratio). If the upstream already sampled the trace (flag=01 in traceparent), this service also samples it.

Q6Which OTel component provides a no-op implementation if the SDK is not installed?show answer ▾

The OTel API — it ships no-ops so instrumentation libraries can depend on it safely without forcing an SDK on downstream users.

Q7What header carries W3C Baggage information?show answer ▾

The baggage header — separate from traceparent/tracestate. Format: baggage: userId=123,tenant=acme

Manual Instrumentation

~3 hrs

▾

Core Theory

TracerProvider: global singleton — obtain via opentelemetry.trace.get_tracer_provider()
Tracer: obtained from TracerProvider, scoped to instrumentation library name + version
Span lifecycle: start_span() → set attributes / add events → end() — always use context managers to guarantee end()
MeterProvider → Meter: analogous pattern to TracerProvider → Tracer
Context: carries active span; use Context.with_value() for async/threading propagation
BatchSpanProcessor vs SimpleSpanProcessor: always use Batch in production — buffers and exports in background
record_exception(): adds an exception event with type, message, and stacktrace; also call set_status(ERROR)

Lab Tasks

Add custom spans around a business function with attributes and events (see lab guide →)
Record an exception and verify span status is ERROR in Jaeger
Create a custom Counter metric and verify it in Prometheus /metrics

OTel Collector — Basics

~3 hrs

▾

Collector Architecture

Deployment patterns: Agent (sidecar / DaemonSet), Gateway (centralised), Direct (no Collector)
Pipeline: Receivers → Processors → Exporters — one pipeline per signal type (traces / metrics / logs)
Receivers: otlp, jaeger, zipkin, prometheus, filelog, hostmetrics, k8sattributes
Processors: batch, memory_limiter, resourcedetection, attributes, transform, filter, tail_sampling
Exporters: otlp, otlphttp, jaeger, prometheus, logging (debug), file
Extensions: health_check, pprof, zpages — for monitoring the Collector itself

⚠Critical for the exam: memory_limiter MUST be first in processors. batch MUST be last. This ordering prevents OOM before the limiter kicks in.

Lab Tasks

Run the OTel Collector with Docker Compose (see lab guide →)
Configure an OTLP receiver → batch processor → logging exporter pipeline
Send a test trace with otel-cli and verify in Jaeger

D10

OTel Collector — Advanced

~3 hrs

▾

Advanced Processors

Processor	Purpose
`attributes`	insert / update / delete / hash / extract attributes on spans, metrics, logs
`transform`	OTTL (OTel Transformation Language) expressions for complex mutations
`filter`	Drop telemetry matching conditions — by attribute, resource, metric name
`tail_sampling`	Buffer full traces then sample by policy: latency, status_code, probabilistic, composite
`routing`	Send telemetry to different exporters based on attribute values

Connectors

Connector	Purpose
`spanmetrics`	Generate RED metrics (calls_total, duration histogram) from trace spans
`count`	Count telemetry items passing through as a metric
`forward`	Pass data from one pipeline to another signal-agnostically

⚠Tail sampling requires ALL spans of a trace to reach the same Collector instance. In multi-Collector deployments, use loadbalancingexporter to route by TraceID hash first.

Lab Tasks

Configure tail_sampling with status_code + latency + composite policies (see lab guide →)
Set up the spanmetrics connector and scrape generated metrics from Prometheus
Use the filter processor to drop health-check spans by URL path

D11

Backends, Exporters & Visualisation

~2.5 hrs

▾

Common Backends

Backend	Signal(s)	Key Notes
Jaeger	Traces	OTLP natively (v1.35+); Jaeger exporter for legacy; UI for trace search & flame graphs
Prometheus	Metrics	Pull model; Collector exposes `/metrics`; remote_write for push
Grafana	All (via datasources)	Connect Jaeger, Prometheus, Loki; exemplars link metrics → traces
Zipkin	Traces	Zipkin exporter in Collector; legacy B3 format support
Loki	Logs	Grafana Loki exporter in Collector; LogQL for querying
Tempo	Traces	Grafana Tempo; OTLP native; deep integration with Loki + Prometheus

★Always prefer the OTLP exporter over native exporters — it's backend-agnostic and future-proof. Use native exporters only for legacy compatibility.

Lab Tasks

Set up full stack: Jaeger + Prometheus + Grafana via Docker Compose (see lab guide →)
Configure Prometheus scrape of Collector /metrics endpoint
In Grafana, link metrics to traces via exemplars (requires spanmetrics connector)

D12

Semantic Conventions & Best Practices

~2.5 hrs

▾

Key Conventions to Know

Attribute	Signal	Example
`http.request.method`	Span	GET, POST, PUT
`http.response.status_code`	Span	200, 404, 500
`url.full`	Span	https://api.example.com/v1/users
`server.address`	Span	api.example.com
`db.system`	Span	postgresql, mysql, redis
`db.statement`	Span	SELECT * FROM users WHERE id = ?
`messaging.system`	Span	kafka, rabbitmq, sqs
`messaging.destination.name`	Span	orders-topic
`rpc.system`	Span	grpc, jsonrpc
`service.name`	Resource	payment-service
`service.version`	Resource	1.2.3
`deployment.environment`	Resource	production
`http.server.request.duration`	Metric	Histogram, unit: seconds

Key Deprecations (exam-relevant)

Old (deprecated)	New (current)
http.method	`http.request.method`
http.status_code	`http.response.status_code`
http.url	`url.full`
net.peer.name	`server.address`
net.peer.port	`server.port`

Lab Tasks

Annotate 5 spans with correct semconv attributes: HTTP server, HTTP client, DB, Kafka, gRPC (see lab guide →)
Review the OTel Semantic Conventions GitHub repo for HTTP and DB namespaces
List 3+ deprecated attribute names and their replacements

D13

Full Mock Exam + Targeted Review

~4 hrs · Timed

▾

⏱Simulate real exam conditions: 90 minutes, no notes, no browser. Track your score per domain. Then review every wrong answer before Day 14.

Mock Exam — 15 Questions (All Domains)

Q1Which OTel component receives, processes, and exports telemetry independently of the application?show ▾

The OTel Collector

Q2In the Collector config, which processor must appear FIRST in the processors list?show ▾

memory_limiter — it must be first to prevent OOM conditions before batching kicks in

Q3What is the default OTLP gRPC port?show ▾

4317

Q4What is the key difference between an ObservableGauge and a Gauge?show ▾

ObservableGauge is asynchronous (callback-based, SDK calls your function at export time). Gauge is synchronous (you call record() explicitly in your code).

Q5Which propagation format does OTel use by default?show ▾

W3C TraceContext (traceparent + tracestate headers), combined with W3C Baggage

Q6What does the spanmetrics connector generate?show ▾

RED metrics from trace spans: calls_total (Counter) and request duration (Histogram), labelled by span kind, service name, and configured dimensions

Q7A span has kind=CLIENT, status=UNSET, and http.response.status_code=503. What status should it have?show ▾

ERROR — 5xx HTTP responses should cause the span status to be set to ERROR, with the status message describing the failure

Q8What does tail sampling require that head sampling does not?show ▾

All spans of a trace must flow through the same Collector instance (or be load-balanced by TraceID) so the full trace can be buffered and evaluated by policy

Q9Which Collector receiver collects logs from a file on disk?show ▾

filelog receiver (from opentelemetry-collector-contrib)

Q10What does OTEL_TRACES_SAMPLER=parentbased_traceidratio do?show ▾

Samples at the configured ratio (OTEL_TRACES_SAMPLER_ARG) for root spans, but always respects the sampling decision already set by a parent span in an incoming traceparent header

Q11Which attribute namespace covers database calls?show ▾

db.* namespace — e.g. db.system, db.statement, db.name, db.operation

Q12What is an Exemplar in OTel Metrics?show ▾

A sample span reference (trace_id + span_id) embedded in a metric data point, enabling navigation from a metric anomaly directly to a representative trace in the backend

Q13What is the difference between TraceState and Baggage?show ▾

TraceState: part of W3C traceparent spec, vendor-specific opaque values, strict size limits, travels with the trace. Baggage: application-level key-value pairs for business context (userId, tenantId), separate header, not sampled with the trace.

Q14How are Histogram bucket boundaries configured in the SDK?show ▾

Via a View with ExplicitBucketHistogramAggregation specifying the bucket boundaries array, attached to the MeterProvider at build time

Q15Which Collector extension provides a health check HTTP endpoint?show ▾

health_check extension — defaults to port 13133, returns {"status":"Server available"}

Day 14

Exam Day Cram Sheet

One-page reference for the morning of your exam. Everything you need to have at the top of your mind.

⚡ Ports & Protocols

OTLP gRPC: 4317 · OTLP HTTP: 4318
Jaeger UI: 16686 · Prometheus: 9090
Zipkin: 9411 · Grafana: 3000
traceparent: 00-{32hex}-{16hex}-{flags}
Sampled flag: 01 = yes · 00 = no

✓ Signal Stability

Traces → STABLE
Metrics → STABLE
Logs → STABLE
Profiling → Experimental
OTLP → STABLE

⚠ Processor Order

memory_limiter → always FIRST
batch → always LAST
Wrong: [batch, memory_limiter]
Right: [memory_limiter, filter, batch]

📊 Instrument Cheatsheet

Always up → Counter
Up or down → UpDownCounter
Distribution → Histogram
Snapshot → Gauge
Async versions → prefix Observable

🔄 Propagators

Default: tracecontext,baggage
B3 single: b3
B3 multi-header: b3multi
AWS X-Ray: xray
Set via: OTEL_PROPAGATORS

🔴 Span Status Rules

UNSET → default, leave alone
ERROR → exceptions, 5xx, failures
OK → explicit success only
NEVER OK on 4xx (client error)
HTTP 200 with biz error → UNSET

📐 Semconv Namespaces

HTTP: http.*, url.*
DB: db.*
Messaging: messaging.*
RPC: rpc.*
Cloud/K8s: cloud.*, k8s.*
Resource: service.*, host.*

📋 Exam Day

Stable internet + working webcam
Government-issued photo ID
Clear desk, no second monitors
PSI browser extension installed
Arrive 30 min early for check-in
Flag & return to uncertain Qs

Progress Tracker

Study Progress

0 / 14 days completed 0%