I’ve been running eBPF-based observability in production for about two years now. The hype is real. But so are the footguns.
Let me be clear: eBPF changed how I think about instrumentation. Attaching probes directly to kernel functions, getting syscall-level visibility with almost no overhead. It’s genuinely impressive tech.
But it’s not the silver bullet vendors want you to believe.
The good stuff
When it works, eBPF is incredible. I can see exactly what’s happening at the kernel level without touching application code. No SDK. No restart. Just attach and observe.
Tools like Grafana Beyla let you auto-instrument HTTP, gRPC, and database calls automatically. For teams that can’t touch legacy code, this is huge.
# Beyla watching all Go services
export BEYLA_OPEN_PORT=8080
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
beyla
Ten seconds to get traces from a service you didn’t write. That’s genuinely cool.
The ugly parts
Here’s what nobody mentions in the conference talks.
Architecture differences bite hard. I had an eBPF program working perfectly on x86_64 that just… broke on ARM64. Different register conventions, different struct layouts. Spent two days debugging something that “should just work.”
RISC-V is even worse. Had a client last month trying to run Cilium on their new RISC-V cluster. Half the eBPF programs silently failed verification.
BPFDoor should scare you. If you’re not aware, BPFDoor is malware that uses eBPF to hide network connections from monitoring. It attaches to raw sockets and filters traffic before your security tools ever see it.
The same capability that makes eBPF great for observability makes it terrifying for security. If you’re running eBPF programs in production, you need to audit what’s actually loaded:
# Check what eBPF programs are running
bpftool prog list
# Look for anything you don't recognize
# Especially XDP and socket filter programs
I’ve found rogue programs on two client systems this year. Neither team knew they were there.
Kernel version roulette. eBPF features depend heavily on kernel version. CO-RE helps, but I still hit compatibility issues constantly. That tracepoint you’re relying on? Might not exist on the kernel your cloud provider ships.
When to skip eBPF entirely
Sometimes traditional instrumentation is just better.
- If you control the source code, OpenTelemetry SDKs give you richer context
- If you need business logic in your traces, eBPF can’t help
- If your team doesn’t have kernel debugging experience, troubleshooting eBPF issues is painful
eBPF excels at infrastructure-level observability. Network flows, syscall patterns, resource usage. For application-level insights, you still need application-level instrumentation.
My current approach
I use both. eBPF for the stuff I can’t instrument otherwise - legacy binaries, third-party services, kernel behavior. OTel for everything I can touch.
The overlap is actually useful. When the eBPF layer shows a syscall spike but OTel shows nothing, I know the problem is outside my application code.
Just don’t expect eBPF to replace proper instrumentation. It won’t.
For more on instrumentation tradeoffs, check out my guide on OTel pitfalls.
— Youn