Hello, I'm just learning about caching (Redis on top of Mongoose in Node) and am being impressed with the benefits of caching for faster data retrieval. My understanding is that using a cache will be faster than accessing the database because of a combination of 1) it is stored in RAM vs hard drive 2) there will usually be less cache documents than actual documents. If set up well, it seems that the the cache would always be faster than directly accessing the database.
I'm trying to extrapolate the caching benefits over time and am wondering...are there situations where due to the way the cache is setup, the cache will actually be slower than than directly accessing the database?
For instance, let's say we have a db of 1000 documents. And we set up the key-value pairs for Redis to be really specific to the point where the # of redis documents is actually larger than # of db documents (say 10,000). Or another example would be where the cache is just really old and the redis collection is much larger now than the # documents in the db.
Has anyone come across that situation and determined that caching is actually slower than db access? Or is it fair to say that if you set up the caching rules correctly, 99% of the time, caching will be faster than direct db access? Help me to understand architecturally how caching performance scales over time and what factors contribute to successful caching implementation.
Thanks!
I've run some benchmarks
Reading from RAM is super fast when compared to Harddisk even on SSD
Regarding your question, redis cache will be always faster than DBs who writes and read from Harddisk. But don't use redis as an alternative to DB, like having a full copy of DB in Redis. It has to be used a cache!
Ram is always costly. What if you have a DB of 100GB? Having a hard disk of 100GB doesn't cost much. But 100GB of RAM is way-way higher.
(There is Redis cluster if you want to scale horizontally)
So if you have thousands of records, cache the ones that are frequently used in Redis.
While developing widgets for MFY, only the widgets that are used within 2 days are stored in Redis. If we didn't get a ping for 2 days it will be automatically purged from the cache. In such a way frequently used widgets will be served from cache. If we need to save all widgets in the cache, that would require too much RAM.
We also use CloudFlare's cache to speed up further. See How I used CloudFlare to reduce API response time to < 100ms
Mark
I think where it usually goes downhill is when it gets too complex to ensure whether the cache is still valid.
If data changes too often or in too complex ways, you need too much logic and possibly database hits to make it worthwhile.
Beyond that, as soon as it doesn't fit in RAM, performance will plummet.