Last month, one of my best SREs told me he got a PM offer at a Series C company. 40% pay bump. No on-call. Ever.

He took it.

I asked why. He said “I love infrastructure. I just can’t do 3am pages anymore.”

That’s not a career pivot. That’s an escape.

The numbers nobody talks about

Average SRE tenure is 3-5 years. Not because they get bored. Because they hit a wall.

Alert fatigue sets in fast. By hour 8 of an on-call shift, your brain starts filtering. By hour 16, you’re silencing stuff to survive. By end of rotation, you’ve trained yourself to ignore pages.

That’s not sustainable. It’s barely survivable.

Where do they go?

I tracked where a few people I’ve worked with ended up. Same pattern:

  • PM roles (“I still get to work on technical products”)
  • Developer advocacy (“I talk about infrastructure without running it”)
  • “Platform engineering” titles (same job, different company, hoping it’s better)
  • Left tech entirely

Zero said “I was bored of the work.” All of them mentioned sleep.

The math your CFO should see

Series B company. 150 engineers. 6 SREs.

An SRE costs $200k fully loaded. Recruiting fee is 20%. That’s $40k just to find them. Ramp time? 6 months minimum. Your new hire isn’t useful until they’ve been paged at 3am for the same database timeout your last person knew how to handle.

The real cost per departure: $150k+ and climbing. Recruiting, reduced capacity, extra burden on remaining team. You’re not running an SRE team. You’re running a recruitment pipeline.

Series C is worse

At Series B, you might have 4-6 SREs. One leaves, it hurts but you survive.

At Series C, you’ve scaled to 300 engineers. 50+ services. You need 8-10 SREs just for reasonable rotation cycles.

# This is what "reasonable" looks like
oncall:
  rotation_length: 7d
  min_team_size: 8        # Less than this = burnout
  max_shifts_per_month: 1 # More than this = turnover
  handoff_overlap: 2h

I’ve seen Series C companies lose two SREs in the same quarter. The remaining team started looking for exits within weeks. Death spiral.

The fix costs less than you think

Stop treating on-call as inevitable suffering.

Measure alert quality. What percentage of pages result in action? Under 80%? You have a noise problem. Fix it before your SRE does.

Reduce pages, not headcount. Your goal isn’t “more people to absorb pain.” It’s “less pain.” SLO-based alerting. Kill the noisy stuff.

Make debugging fast. Your SRE isn’t burned out because they got paged. They’re burned out because it took 90 minutes to figure out what happened. Good observability turns a 3am crisis into a 3am annoyance.

Pay for it. On-call compensation should hurt when on-call is bad. If you’re not paying extra for pages, you’re pretending on-call is free. You’re paying in turnover instead.

The cost comparison

Fixing on-call: Maybe $50k in tooling. 3-6 months of focused effort.

Replacing an SRE every 2 years: $150k+ per departure. Forever. Plus the knowledge that walks out the door.

Your CFO can do this math. Show it to them.

One last thing

Talk to your SREs. Actually ask them.

“What’s the most annoying thing about on-call right now?”

The answer won’t be “I wish I had more dashboards.” It’ll be specific. A particular alert. A rotation that’s too heavy.

Fix that thing. Then ask again.

Your best SREs don’t want to leave. They want to do the work they signed up for. Stop making it impossible.


More on the real costs of reliability problems in my reliability cost breakdown.

— Youn