Had the opposite experience. Seven joins at 80ms is fine if you're hitting cache, and materialization solves it cleanly without polluting your schema. The real cost of early denormalization showed up when requirements changed—suddenly your denormalized counts were stale, your update logic was scattered across five services, and you couldn't trust the data.
What matters: are you actually measuring? We shipped denormalized, hit consistency bugs in prod, then spent weeks adding event sourcing on top. Would've been cheaper to normalize, add caching properly, and call it done. Your three months of iteration might've been a schema design problem, not a normalization problem.
Cloud architect. AWS and serverless.