I was in a board meeting last year. The CTO proudly announced: “We achieved 99.95% uptime this quarter.”

The board nodded politely. Then moved on to revenue numbers.

That 99.95% meant nothing to them. It’s an abstract number. What they wanted to know was: how much money did we lose when things broke?

The CTO didn’t have that number.

The translation problem

Engineers think about reliability in nines. 99.9% sounds great. 99.99% sounds even better. We build SLOs around these numbers. We page oncall when we drop below them.

But here’s 99.9% in human terms: 8.7 hours of downtime per year. At Series C scale, that’s probably $50k-$100k per hour in lost transactions, refunds, and engineering time. So your “three nines” system cost you $400k-$800k this year.

Suddenly 99.9% doesn’t sound so great.

The math nobody does

I ask every CTO I work with: what does one hour of downtime actually cost you?

Most don’t know. They’ve never calculated it.

Here’s the rough formula:

  • Lost revenue per hour: your average hourly transaction volume
  • Refunds and credits: what you give angry customers to keep them
  • Engineering cost: hours spent by engineers who should be building features
  • Customer churn: the ones who don’t come back (this one hurts most)

A Series C company I worked with did the math. Their “minor” 45-minute incident cost them $73k. Not the downtime itself. The full picture: lost sales, support tickets, engineering time, one enterprise deal that went cold.

They had four of these “minor” incidents last quarter.

The bad Friday

Every startup has one. The deploy that went wrong on Friday afternoon. The cascade failure that lasted through the weekend.

At Series B-D scale, a bad Friday is $500k+. Sometimes more.

I’ve seen companies lose $2M in a single weekend incident. Not because the system was down for days. Because it was degraded just enough to frustrate users during a high-traffic period. Cart abandonments spiked. Enterprise customers noticed. The sales pipeline felt it for months.

How to talk about this

Stop reporting uptime percentages to the board. Start reporting:

  • “Incidents cost us $180k this quarter, down from $340k last quarter”
  • “We avoided an estimated $2M in losses by catching this before it hit production”
  • “Investing $50k in observability reduced our incident costs by $200k annually”

This is language they understand. This is how you get budget approved.

The awkward conversation

Most CTOs avoid this calculation because the number is scary. If you’ve never measured it, you don’t have to explain it.

But here’s the thing: the board will eventually ask. And “I don’t know” is worse than a big number.

At least with a big number you can show a plan to reduce it.


I put together a framework for calculating reliability costs at reliability costs. Use it before your next board meeting.

— Youn