I had coffee with a backend engineer last week. Smart guy. Solid experience. He’d been trying to add tracing to his service for three weeks.

Three weeks. For tracing.

Something is broken here.

OTel feels like homework

Here’s the thing. OTel works. The spec is solid. The community is huge. But using it still feels like paying taxes. You do it because you have to, not because it helps you in the moment.

Every time I set up a new service, I spend an hour looking up semantic conventions. Was it http.request.method or http.method? Did they deprecate db.statement yet? What’s the correct way to record an exception in 2026?

The conventions change. The SDKs change. The “right” way to do things changes every six months.

The demo vs reality gap

OTel demos are beautiful. You spin up a sample app, add three lines of auto-instrumentation, and boom - traces appear in Jaeger.

Then you try it on a real service. Your framework isn’t auto-instrumented. Your database driver needs manual spans. Your message queue uses a custom protocol nobody wrote an integration for.

Suddenly those three lines become three hundred. And you’re reading SDK source code to figure out why your spans have no parent.

I worked with a team last month that had “full OTel coverage” according to their tech lead. Turns out 40% of their traces were broken. Disconnected spans floating in the void. They’d given up trying to fix it.

Why non-experts struggle

OTel assumes you already know distributed tracing. It assumes you understand propagation, sampling strategies, context management.

Most engineers don’t. They want to see where requests go. They don’t want a PhD in observability.

The docs are comprehensive. Maybe too comprehensive. You can read for hours and still not know how to trace a single endpoint properly.

What actually helps

Forget full coverage. Start with one critical path.

Pick your most important user flow. Maybe it’s checkout. Maybe it’s login. Instrument just that. End to end. Make sure every span connects.

with tracer.start_as_current_span("checkout.process") as span:
    span.set_attribute("user.id", user_id)
    span.set_attribute("cart.item_count", len(cart.items))

    result = process_payment(cart)
    span.set_attribute("checkout.success", result.success)

Simple. Explicit. You know exactly what’s being traced.

Then add the next critical path. And the next. Build up slowly.

Auto-instrumentation is nice for filling gaps. But the important stuff? You want to see it, name it, control it yourself.

The uncomfortable truth

OTel got the hard technical problems right. Distributed context propagation is actually solved. That’s impressive.

But usability? Still rough. Still feels like a tool built by experts for other experts.

Maybe that’s fine. Maybe observability is just hard. But I keep meeting smart engineers who can’t make it work, and that tells me something.


More thoughts on avoiding common traps in my technical guide on OTel pitfalls.

— Youn