Skip to main content
← Back to articles
python

Observability for Python Apps: Logging, Metrics, Tracing with OpenTelemetry | redesign.ir

October 30, 202514 min read

Production-grade observability for Python: structured logging, RED/USE metrics, and distributed tracing with OpenTelemetry, Prometheus, Tempo, and Grafana. Copy-ready patterns and pitfalls.

Observability for Python Apps: Logging, Metrics, Tracing with OpenTelemetry

Estimated reading time: 14 min · Published Oct 31, 2025

Monitoring asks “is it up?” Observability asks “why is it slow?” This guide shows how to add structured logging, RED/USE metrics, and distributed tracing to Python services using OpenTelemetry, Prometheus, Tempo, and Grafana.

1) Architecture at a Glance

  • App (FastAPI/Django + OTel SDK) → OTel Collector
  • Logs → Loki · Metrics → Prometheus · Traces → Tempo
  • Dashboards/Alerts → Grafana

2) Structured Logging


# logging_setup.py
import logging, sys, json, time

class JsonFormatter(logging.Formatter):
    def format(self, record):
        base = {
            "ts": time.time(),
            "level": record.levelname,
            "msg": record.getMessage(),
            "logger": record.name,
        }
        if record.exc_info:
            base["exc"] = self.formatException(record.exc_info)
        return json.dumps(base)

def configure_json_logging(level=logging.INFO):
    h = logging.StreamHandler(sys.stdout)
    h.setFormatter(JsonFormatter())
    root = logging.getLogger()
    root.handlers.clear()
    root.addHandler(h)
    root.setLevel(level)
      

Emit JSON to stdout; collectors parse and route without regex gymnastics.

3) Metrics with Prometheus Client


from prometheus_client import Counter, Histogram, Gauge

REQS = Counter("http_requests_total", "Total HTTP requests", ["route","method","code"])
LAT = Histogram("http_request_duration_seconds", "Latency", ["route","method"], buckets=[.05,.1,.2,.5,1,2,5])
INFLIGHT = Gauge("http_inflight_requests", "Active requests")

def before_request(route, method):
    INFLIGHT.inc()
    timer = LAT.labels(route, method).time()
    return timer

def after_request(route, method, code, timer):
    timer.observe_duration()
    INFLIGHT.dec()
    REQS.labels(route, method, code).inc()
      

Expose /metrics via WSGI/ASGI middleware and let Prom scrape.

4) Distributed Tracing with OpenTelemetry


# otel_setup.py
from opentelemetry import trace
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

def init_tracing(service_name="api"):
    provider = TracerProvider(resource=Resource.create({SERVICE_NAME: service_name}))
    processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4318/v1/traces"))
    provider.add_span_processor(processor)
    trace.set_tracer_provider(provider)
    return trace.get_tracer(service_name)
      

FastAPI integration


from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from logging_setup import configure_json_logging
from otel_setup import init_tracing

app = FastAPI()
configure_json_logging()
tracer = init_tracing("redesign-api")
FastAPIInstrumentor.instrument_app(app)

@app.get("/health")
def health():
    return {"ok": True}
      

5) OpenTelemetry Collector Config (single pipeline)


receivers:
  otlp:
    protocols:
      http:
exporters:
  otlphttp/tempo:
    endpoint: http://tempo:4318
  prometheus:
    endpoint: 0.0.0.0:9464
processors:
  batch: {}
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/tempo]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]
      

6) Dashboards & Alerting

  • RED (Rate, Errors, Duration) for user-facing endpoints.
  • USE (Utilization, Saturation, Errors) for system resources.
  • Set SLOs (e.g., “p95 < 300ms”), alert on burn rate, not single breaches.

7) Cost & Cardinality Control

  • Use exemplar sampling for traces; reduce high-cardinality labels.
  • Hash or truncate IDs in logs to avoid PII and cardinality explosions.
  • Enable tail-based sampling in Collector for “errors-first” traces.

8) Common Pitfalls

  • Unstructured logs → impossible correlation.
  • Too many metrics → high scrape/TSDB cost; start with RED/USE.
  • Ignoring propagation headers → broken traces across services.
“See the system. Hear its signals. Then design with empathy.” — redesign.ir
Tip: Correlate everything via trace_id/ span_id—add them to log records using a logging filter for one-click jumps from logs → trace.

Keywords: python observability, opentelemetry, logging, metrics, tracing

Tags: observability, python, devops

Meta description: Add logging, metrics, and tracing with OpenTelemetry to make Python services debuggable at scale.

© 2025 redesign.ir · Crafted by SCRIBE/CORE · “Illuminate through information.”

Topics
#python#observability#apps#logging#metrics#tracing#opentelemetry#redesign

Share this article

Help others discover it across your favourite communities.

Comments

Join the discussion. We keep comments private to your device until moderation tooling ships.

0 comments