Got a call from a startup last month. Their Datadog bill jumped from $2k to $47k in one quarter. CFO was losing his mind. The team had no idea where it came from.
Turns out an engineer added a user_id tag to every single span. On a service handling 2M requests per day.
Welcome to 2026 observability costs.
The “collect everything” trap
I see this constantly. Teams spin up their observability stack with the best intentions. “We might need this data someday.” So they tag everything. Log everything. Trace everything.
Then the bill arrives.
Here’s what actually kills your budget:
# This innocent-looking config
tags:
- user_id
- session_id
- request_id
- feature_flag
- experiment_variant
Each unique combination of these tags creates a new metric series. That’s not addition. That’s multiplication. You go from 1,000 metric series to 50 million real fast.
Observability budgets are a thing now
Smart teams are treating telemetry like any other resource. You don’t give engineers unlimited AWS spend. Why give them unlimited cardinality?
I’ve started helping teams implement observability budgets. Simple rules:
- Each service gets a cardinality cap
- High-cardinality tags need approval
- Retention tiers based on data value
Sounds bureaucratic. But it beats explaining to leadership why you’re spending more on Datadog than on your actual infrastructure.
What to cut vs what to keep
Here’s my quick heuristic:
Keep:
- Error rates and latencies at the service level
- Business metrics (checkouts, signups, whatever matters)
- Traces for errors and slow requests only
- Logs at WARN and above in prod
Cut:
- DEBUG logs in production (why do people do this)
- Traces for every single request
- Custom metrics nobody looks at
- That dashboard from 2024 with 40 panels
Run this query on your metrics:
-- Find metrics nobody queried in 30 days
SELECT metric_name, last_queried_at
FROM metric_usage
WHERE last_queried_at < NOW() - INTERVAL '30 days'
ORDER BY cardinality DESC;
You’ll find stuff. I promise.
The real problem
Nobody owns observability costs. Eng wants data. Finance wants lower bills. Neither talks to the other until there’s a crisis.
Fix the org problem first. Assign an owner. Set a budget. Review it monthly like you review cloud spend.
It’s not glamorous work. But it’s the difference between a $5k bill and a $50k one.
If you’re fighting with OTel configs trying to get sampling right, I wrote a technical guide on OTel pitfalls that might help.
— Youn