Your team hit 100% coverage. Two weeks later, a payment flow failed in prod. The test suite passed every run.
Coverage % is a leading indicator — measured before production, in CI, supposed to predict stability. The problem is not where it sits in the measurement cycle. It's what it's measuring. Coverage tells you what code executed during a test run. It says nothing about whether that execution verified anything.
A test that calls a function and asserts nothing counts toward coverage. A happy-path test on a checkout flow with no assertion on the timeout branch is "covered." Your number went up. Your confidence in that code should not have. This is why high coverage and high production incident rates coexist without contradiction.
So the question becomes: what leading indicators actually predict whether your tests will catch a failure before users do?
Assertion density on critical paths
Pick your five highest-risk flows — the ones that handle money, auth, or data integrity. For each one, count how many distinct failure states have an explicit assertion. Not line coverage. Assertion coverage on the states that actually matter.
A payment flow with 90% line coverage and no assertion on the declined-card state is not tested. It's a demo with a green badge.
What mutation testing reveals
Some tools deliberately break your code before running your tests — flip a > to >=, remove a return value, quietly delete a conditional branch. Each change is a mutant. Then your test suite runs. If your tests still pass against the broken version, the mutant survived. Your suite did not notice the logic changed.
Teams with 85% line coverage running this exercise often find their tests catching 40% of deliberate breaks. Not because the tests are badly written — because most were written to execute code, not to verify what it does. That 40% is your real coverage number. It's uncomfortable to present to leadership, which is exactly why it's worth presenting.
Test signal ratio
When your suite fails in CI, what % of those failures catch actual defects vs flaky noise. A suite that's 40% flaky trains engineers to ignore red builds. That's not a testing problem — it's a signal problem. And it makes every other metric you track less reliable.
What to actually do
None of this means dropping coverage %. It still has a job: floor, not ceiling. Below 60% on core modules and you're flying blind. But above that threshold, coverage % stops being predictive on its own. What it cannot tell you is whether the coverage you have is actually verifying behaviour.
Report coverage % as hygiene — a minimum bar. Report assertion density and mutation score as quality signals. Use defect escape rate as the lagging validator: when something breaks in prod, was that area covered, and did the tests assert the right failure state? That post-mortem question will calibrate your leading indicators faster than any dashboard.
100% coverage with production incidents does not mean you have the wrong metric. It means you have only one dimension of a multi-dimensional problem.