2-Week Complete Exam Preparation
The OTCA is a CNCF-administered certification validating hands-on proficiency with OpenTelemetry instrumentation, the Collector, and the observability ecosystem.
Each daily session is 2–3 hours. Week 1 builds foundational knowledge; Week 2 covers advanced topics and exam prep.
| Day | Focus Area | Key Topics | Deliverable |
|---|---|---|---|
| Day 1 | Observability Fundamentals | Pillars, SLOs, RED/USE/Golden Signals | Notes + flashcards |
| Day 2 | OTel Architecture & Concepts | OTEP, spec, SDKs, APIs, signals overview | Diagram + flashcards |
| Day 3 | Traces & Context Propagation | Spans, attributes, W3C TraceContext, Baggage | Trace diagram lab |
| Day 4 | Metrics in OTel | Instruments, temporality, exemplars, OTLP | Metrics lab |
| Day 5 | Logs in OTel | Log model, log appenders, log bridge, severity | Logs lab |
| Day 6 | Auto-Instrumentation | Java agent, Python, .NET, Node.js | Auto-instr lab |
| Day 7 | Week 1 Review | All signals, context propagation, quiz | Practice test |
| Day 8 | Manual Instrumentation | SDK setup, custom spans/metrics, API patterns | SDK lab |
| Day 9 | OTel Collector — Basics | Receivers, processors, exporters, pipeline | Collector config lab |
| Day 10 | OTel Collector — Advanced | Batch, memory, sampling, transform, routing | Advanced pipeline |
| Day 11 | Backends & Exporters | Jaeger, Prometheus, OTLP, Zipkin, Grafana | Backend integration |
| Day 12 | Semantic Conventions | Resource, span names, attribute namespaces | Annotation exercise |
| Day 13 | Full Mock Exam + Review | All domains, timed practice, weak spot review | Score analysis |
| Day 14 | Final Review & Exam Day | Cram sheet, exam readiness check | Certification exam |
| Method | Stands For | Best For |
|---|---|---|
| RED | Rate, Errors, Duration | Request-driven services (APIs, microservices) |
| USE | Utilisation, Saturation, Errors | Resources (CPU, disk, network, memory) |
| Golden Signals | Latency, Traffic, Errors, Saturation | Google SRE — service health overview |
| MELT | Metrics, Events, Logs, Traces | Industry grouping of telemetry types |
4317, HTTP port 4318 — the canonical transport protocol| Component | Role |
|---|---|
| API | Language-specific interfaces; no-op by default without SDK — safe for libraries to depend on |
| SDK | Implements the API; configurable pipelines, samplers, exporters — used by applications |
| Instrumentation Library | Pre-built instrumentation for frameworks/libs (Flask, Spring, Express, etc.) |
| Exporter | Sends telemetry to a backend (OTLP, Jaeger, Prometheus, Zipkin, etc.) |
| Collector | Receives, processes, and exports telemetry — agent or gateway deployment |
| Resource | Entity producing telemetry: service.name, service.version, host.name, k8s.pod.name |
SERVER (inbound), CLIENT (outbound), PRODUCER, CONSUMER, INTERNAL00-{traceId}-{parentSpanId}-{flags} where flags 01=sampled, 00=not sampled| Field | Example / Description |
|---|---|
| TraceID | 4bf92f3577b34da6a3ce929d0e0e4736 — 128-bit, hex |
| SpanID | 00f067aa0ba902b7 — 64-bit, hex |
| traceparent | 00-4bf9...4736-00f0...02b7-01 |
| tracestate | rojo=00f067aa0ba902b7,congo=t61rcWkgMzE — vendor state |
| Baggage | userId=123;tenant=acme — business context, in-band |
| Span Kind | SERVER · CLIENT · PRODUCER · CONSUMER · INTERNAL |
| Status Code | UNSET / OK / ERROR |
traceparent headers in Chrome DevTools → Network tabinject() and extract() from memory| Instrument | Sync/Async | Use Case |
|---|---|---|
| Counter | Sync | Monotonically increasing — total requests, bytes sent |
| UpDownCounter | Sync | Can go up or down — queue depth, active connections |
| Histogram | Sync | Latency distributions — request duration, payload size |
| Gauge | Sync | Point-in-time snapshot — CPU %, memory bytes |
| ObservableCounter | Async (callback) | Cumulative totals measured asynchronously |
| ObservableUpDownCounter | Async (callback) | Bidirectional totals measured asynchronously |
| ObservableGauge | Async (callback) | Point-in-time async — temperature sensors, external reads |
{namespace}.{name} e.g. http.server.request.duration/metrics endpointfilelog receiver in the Collector to tail a log file with a regex parser| Language | Mechanism | Key Detail |
|---|---|---|
| Java | JVM agent JAR | -javaagent:opentelemetry-javaagent.jar at startup — zero code changes |
| Python | CLI wrapper | opentelemetry-instrument python app.py |
| Node.js | require at startup | require('@opentelemetry/sdk-node') + auto-instrumentations package |
| .NET | Env var + native profiler | OTEL_DOTNET_AUTO_HOME env var; CLR profiler API |
| Go | eBPF (experimental) | Requires no source changes; still maturing |
| Variable | Purpose / Example |
|---|---|
| OTEL_SERVICE_NAME | Service identifier: payment-service |
| OTEL_EXPORTER_OTLP_ENDPOINT | Collector endpoint: http://otel-collector:4317 |
| OTEL_EXPORTER_OTLP_PROTOCOL | grpc (default) | http/protobuf | http/json |
| OTEL_TRACES_SAMPLER | always_on | always_off | parentbased_traceidratio |
| OTEL_TRACES_SAMPLER_ARG | Ratio, e.g. 0.1 for 10% |
| OTEL_PROPAGATORS | tracecontext,baggage (default) |
| OTEL_RESOURCE_ATTRIBUTES | deployment.environment=prod,k8s.namespace.name=payments |
| OTEL_LOGS_EXPORTER | otlp | none |
| OTEL_METRICS_EXPORTER | otlp | prometheus | none |
OTEL_TRACES_SAMPLER_ARG between 1.0, 0.1, and 0 — observe the differencetraceparent header from memory?4317 (gRPC) · 4318 (HTTP/protobuf)
UNSET — HTTP 200 is not an OTel-level error. Only set ERROR when the operation itself failed (exceptions, 5xx, etc.)
Delta temporality — it reports the change since the last export, so after a restart the first export is 0. Cumulative would continue from where it left off (or report a reset dip).
Span Link: a reference to another trace — models causal relationships like message queue fan-out. Span Event: a timestamped annotation within the current span (e.g. "cache miss", exception details).
parentbased_* samplers (parentbased_always_on, parentbased_traceidratio). If the upstream already sampled the trace (flag=01 in traceparent), this service also samples it.
The OTel API — it ships no-ops so instrumentation libraries can depend on it safely without forcing an SDK on downstream users.
The baggage header — separate from traceparent/tracestate. Format: baggage: userId=123,tenant=acme
opentelemetry.trace.get_tracer_provider()start_span() → set attributes / add events → end() — always use context managers to guarantee end()Context.with_value() for async/threading propagationexception event with type, message, and stacktrace; also call set_status(ERROR)/metricsotlp, jaeger, zipkin, prometheus, filelog, hostmetrics, k8sattributesbatch, memory_limiter, resourcedetection, attributes, transform, filter, tail_samplingotlp, otlphttp, jaeger, prometheus, logging (debug), filehealth_check, pprof, zpages — for monitoring the Collector itselfmemory_limiter MUST be first in processors. batch MUST be last. This ordering prevents OOM before the limiter kicks in.otel-cli and verify in Jaeger| Processor | Purpose |
|---|---|
attributes | insert / update / delete / hash / extract attributes on spans, metrics, logs |
transform | OTTL (OTel Transformation Language) expressions for complex mutations |
filter | Drop telemetry matching conditions — by attribute, resource, metric name |
tail_sampling | Buffer full traces then sample by policy: latency, status_code, probabilistic, composite |
routing | Send telemetry to different exporters based on attribute values |
| Connector | Purpose |
|---|---|
spanmetrics | Generate RED metrics (calls_total, duration histogram) from trace spans |
count | Count telemetry items passing through as a metric |
forward | Pass data from one pipeline to another signal-agnostically |
loadbalancingexporter to route by TraceID hash first.spanmetrics connector and scrape generated metrics from Prometheusfilter processor to drop health-check spans by URL path| Backend | Signal(s) | Key Notes |
|---|---|---|
| Jaeger | Traces | OTLP natively (v1.35+); Jaeger exporter for legacy; UI for trace search & flame graphs |
| Prometheus | Metrics | Pull model; Collector exposes /metrics; remote_write for push |
| Grafana | All (via datasources) | Connect Jaeger, Prometheus, Loki; exemplars link metrics → traces |
| Zipkin | Traces | Zipkin exporter in Collector; legacy B3 format support |
| Loki | Logs | Grafana Loki exporter in Collector; LogQL for querying |
| Tempo | Traces | Grafana Tempo; OTLP native; deep integration with Loki + Prometheus |
/metrics endpoint| Attribute | Signal | Example |
|---|---|---|
http.request.method | Span | GET, POST, PUT |
http.response.status_code | Span | 200, 404, 500 |
url.full | Span | https://api.example.com/v1/users |
server.address | Span | api.example.com |
db.system | Span | postgresql, mysql, redis |
db.statement | Span | SELECT * FROM users WHERE id = ? |
messaging.system | Span | kafka, rabbitmq, sqs |
messaging.destination.name | Span | orders-topic |
rpc.system | Span | grpc, jsonrpc |
service.name | Resource | payment-service |
service.version | Resource | 1.2.3 |
deployment.environment | Resource | production |
http.server.request.duration | Metric | Histogram, unit: seconds |
| Old (deprecated) | New (current) |
|---|---|
| http.method | http.request.method |
| http.status_code | http.response.status_code |
| http.url | url.full |
| net.peer.name | server.address |
| net.peer.port | server.port |
The OTel Collector
memory_limiter — it must be first to prevent OOM conditions before batching kicks in
4317
ObservableGauge is asynchronous (callback-based, SDK calls your function at export time). Gauge is synchronous (you call record() explicitly in your code).
W3C TraceContext (traceparent + tracestate headers), combined with W3C Baggage
RED metrics from trace spans: calls_total (Counter) and request duration (Histogram), labelled by span kind, service name, and configured dimensions
ERROR — 5xx HTTP responses should cause the span status to be set to ERROR, with the status message describing the failure
All spans of a trace must flow through the same Collector instance (or be load-balanced by TraceID) so the full trace can be buffered and evaluated by policy
filelog receiver (from opentelemetry-collector-contrib)
Samples at the configured ratio (OTEL_TRACES_SAMPLER_ARG) for root spans, but always respects the sampling decision already set by a parent span in an incoming traceparent header
db.* namespace — e.g. db.system, db.statement, db.name, db.operation
A sample span reference (trace_id + span_id) embedded in a metric data point, enabling navigation from a metric anomaly directly to a representative trace in the backend
TraceState: part of W3C traceparent spec, vendor-specific opaque values, strict size limits, travels with the trace. Baggage: application-level key-value pairs for business context (userId, tenantId), separate header, not sampled with the trace.
Via a View with ExplicitBucketHistogramAggregation specifying the bucket boundaries array, attached to the MeterProvider at build time
health_check extension — defaults to port 13133, returns {"status":"Server available"}
One-page reference for the morning of your exam. Everything you need to have at the top of your mind.
4317 · OTLP HTTP: 431816686 · Prometheus: 90909411 · Grafana: 300000-{32hex}-{16hex}-{flags}01 = yes · 00 = nomemory_limiter → always FIRSTbatch → always LAST[batch, memory_limiter][memory_limiter, filter, batch]CounterUpDownCounterHistogramGaugeObservabletracecontext,baggageb3b3multixrayOTEL_PROPAGATORShttp.*, url.*db.*messaging.*rpc.*cloud.*, k8s.*service.*, host.*