eBPF for breadth, SDKs for depth

The pitch for eBPF auto-instrumentation is seductive. Get traces and metrics for every service, no code changes, no SDK, no redeploy. Just drop an agent on the node and watch the data flow.

It’s real now. Grafana donated Beyla to OpenTelemetry, where it became the official eBPF Instrumentation project, OBI, with Splunk, Coralogix, and Odigos pitching in. First alpha shipped this year. This is no longer a science experiment.

So the question every platform lead is asking. Can we finally rip out all the SDK instrumentation and just run eBPF?

No. And the team that built it will tell you the same thing.

What eBPF actually gives you

eBPF instrumentation hooks into the kernel and watches the syscalls your processes make, network reads and writes mostly. From that it reconstructs RED metrics (rate, errors, duration) and basic trace spans for HTTP/S and gRPC traffic.

The killer feature is breadth. It covers Go, C/C++, Rust, Python, Ruby, Java, Node, and .NET on Linux, all from the same agent, with zero code changes. That legacy service nobody wants to touch? Instrumented. The contractor-built thing with no owner? Instrumented. The polyglot mess of eight languages? One agent, all of it.

For getting some signal on everything fast, nothing beats it. Day one, whole fleet, no PRs.

What eBPF can’t see

Here’s the ceiling, and it’s a hard one.

eBPF sees syscalls. It does not see inside your process. It knows a request came in and a response went out. It does not know what your code decided in between.

So it can’t see business context. “This request was a checkout for a $2,400 enterprise order that failed at the payment step.” That’s application knowledge, living in your code’s variables. The kernel never sees it. eBPF gives you “POST /checkout, 500, 1.2s” and stops.

And it struggles with full distributed context. Propagating trace context cleanly across service boundaries, the thing that turns a pile of spans into one coherent trace across ten services, still wants instrumentation that understands your headers and your framework. eBPF can stitch some of it. It can’t do all of it reliably.

This isn’t me being a skeptic. Grafana, who built Beyla, publishes the same position. You need both eBPF and SDKs. eBPF for breadth, SDKs for depth.

The strategy that actually works

Stop framing it as a choice. Layer them.

eBPF is the floor. Run it across the whole fleet for instant RED coverage on every service. This is your baseline, no service is ever completely dark again, and you got there without a single code review.

SDKs go on the paths that matter. Your critical flows, checkout, signup, the API that makes you money, get real SDK instrumentation with business context, custom spans, and the attributes that let you debug at 3am. This is where you spend the engineering effort, because this is where downtime costs you money.

The mistake is doing one and calling it done. eBPF-only gives you wide shallow coverage, you’ll know that checkout is slow, never why. SDK-only gives you deep coverage on the 30% of services someone bothered to instrument, and blind spots everywhere else. Those blind spots are exactly where the surprise outage comes from.

The honest take

OBI being a real OpenTelemetry project changes the math. Zero-touch breadth used to mean an expensive vendor agent. Now it’s an open component you can adopt today, and it’s worth adopting today, as a floor.

Just don’t let “we have eBPF now” become the reason nobody instruments the checkout flow properly. Breadth tells you where to look. Depth tells you what’s wrong. You need both, and now you can actually have both.

Where to spend your instrumentation effort for the most return, and the traps that waste it, is in my OTel pitfalls guide.

— Youn