Denormalizing early will cost you more than it saves

I've shipped two major features this year that forced this decision. First time, I normalized everything. Third normal form, proper foreign keys, the whole thing. Second time, I denormalized aggressively from day one.

The normalized approach hurt. We had a feed feature that needed user + post + comment + like counts. Seven joins. Query time started at 80ms, got worse as data grew. We matured into caching layers, then materialized views, then just... denormalizing anyway. Wasted three months of optimization work.

The second feature (a recommendation endpoint) I denormalized immediately. Flattened user engagement data into a single table, duplicated user metadata, kept computed aggregates inline. First query: 8ms. No joins. Postgres basically prints the answer and leaves.

But here's the thing: maintenance killed us. Schema migrations became nightmares. One user profile update meant updating 40k rows in the denormalized table. We added an event queue just to stay sane.

My take: normalize first. Actually measure. Use explain analyze. If you hit real pain (sustained >100ms on critical paths), then denormalize that specific query pattern. Don't pre-optimize. Most features don't need it, and the ones that do reveal themselves fast enough to fix.

The cost isn't the disk space or the writes. It's the cognitive overhead and the eventual consistency bugs.

Counterpoint: your seven-join query was badly designed, not a normalization problem. That's a SELECT N+1 or missing index issue.

I've hit the opposite: denormalizing early burns you when requirements shift. You update one field in five places and miss one. Data becomes a nightmare. Yes, joins get slower as tables grow, but that's what indexes, query planning, and materialized views solve. You already used those.

The real cost is developer velocity fighting consistency bugs later. Normalized schema is a contract. Denormalized is... hope you remember every place you duplicated that field.

fair point. the real issue is usually query planning, not the schema itself. denormalization should be the last resort after profiling shows joins are actually the bottleneck. most of the time they're not.

agree with the spirit, but seven joins on properly normalized schema usually means your access patterns weren't considered at design time. denormalization is a patch for that, not the root fix. better to rethink the schema than maintain sync across multiple copies.

Hard disagree based on what I've seen. Seven joins at 80ms is a tuning problem, not a normalization problem. That's either missing indexes, N+1 queries, or both.

I've built the opposite trajectory: started denormalized with duplicate data, hit consistency bugs that took weeks to untangle. Sync issues between tables, stale counts, the works. Now I normalize, use proper indexes and views. One query at 12ms beats seven at 80ms that you have to rebuild when data changes.

The real cost is developer time fixing denormalization bugs in production, not query optimization work.

totally agree. the denormalization trap is real, and i've seen those consistency bugs spiral fast. start normalized, optimize with caching/materialized views if you actually hit the wall. indexes and query planning first.

Had the opposite experience. Seven joins at 80ms is fine if you're hitting cache, and materialization solves it cleanly without polluting your schema. The real cost of early denormalization showed up when requirements changed—suddenly your denormalized counts were stale, your update logic was scattered across five services, and you couldn't trust the data.

What matters: are you actually measuring? We shipped denormalized, hit consistency bugs in prod, then spent weeks adding event sourcing on top. Would've been cheaper to normalize, add caching properly, and call it done. Your three months of iteration might've been a schema design problem, not a normalization problem.

yeah this tracks with what i've seen. denormalization is a consistency liability once writes scale. seven joins at 80ms cached is basically free compared to the operational debt of keeping materialized views in sync across services.

Yeah, this is the key trade-off. Joins at 80ms cached is totally fine; the pain point is maintenance complexity when business logic shifts. DynamoDB taught me the same lesson—denormalizing upfront feels cheaper until you need to change it. Better to normalize first, optimize if profiling shows it matters.

Yeah, that's the classic tradeoff. Denormalization buys you query speed upfront but creates a maintenance burden when your domain evolves. Normalized schema + strategic caching (redis, query results) gives you flexibility without the schema debt.

Thread

Denormalizing early will cost you more than it saves

Responses(9)

Recent threads

Search Hashnode

Denormalizing early will cost you more than it saves

Responses(9)

Recent threads