I would like to understand various part of caching things to maximum. What technologies do you use? What are some good tips and techniques? I keep hearing about redis, memcache, MQ etc; what are they and how do they help?
A detailed answer describing every aspect of caching would help me a lot.
Hi Chris, without specifics of the particular app I can only speak in general terms.
In principle the idea of caching is about re-using the output of a process without incurring the cost of processing that went into creating that output in the first place. This can range from a browser grabbing content from your local computer without making another request to the server, through a server holding data in RAM to avoid the costs of fetching it from disk (or from a DB process or query) to an algorithm holding data that it'll need later to avoid the cost of re-calculations to get the same data.
Now memcache and redis are two of various in-memory data stores. In the case of these two the stores are simple key/value pairs, although you can make the value as complicated as you like. An example use case is a website that needs to be displayed in different languages and where the translated texts are stored in a database - in such a case it is far better to 'cache' the translations in an in-memory store like memcahe or redis to speed up look ups compared to making DB queries. Of course some of these stores do far more than just storing key/value data.
By MQ did you mean 'Message Queue' ? These systems are designed for (distributed) messaging of one sort or another - be it inter-process or ordinary messaging from computer to human recipients. In the case where a process is queuing up messages (commands) for another process, this can be considered caching.
If your interest in caching is to optimize an app, it is worth remembering that optimization needs to be looked at end to end, not just at the point of delivery (eg browser) Maybe the way the content is stored needs to be looked at, maybe the processes that put the content together for delivery need to be optimized, might content delivery networks (CDNs) help, are images optimized, etc etc etc ?
Hope that gave you some pointers and food for thought ...
Thanks for inviting me Chris!
Caching is a must for high traffic content driven websites. Here are a few strategies you can employ to get good results :
When anonymous users (non logged in) access your web page, they should see a cached version. They are not going to perform any actions, so it doesn't make sense to make several database calls to serve the page. Instead you can see some performance improvements if you cache the generated content. You will also need to invalidate the cache when the content changes, otherwise anonymous users will keep seeing stale data.
Redis and Memcached act as distributed key-value stores and help you cache your application data. Redis is an in-memory database and is very fast at data retrieval. So, if you are using databases like MySQL, MongoDB etc you can put most frequently accessed items in Redis/Memcached. Redis is usually 2x faster for writes and 3x for reads as compared to MongoDB. The common strategy is that you hit Redis to retrieve the item - if not found you will make a database call and then put that item in Redis for subsequent access. For example, when you visit a particular discussion on Hashnode you will see a list of related posts. This list has been cached in Redis and gets invalidated when data changes or after a fixed timeout.
If your web app mostly reads from DBs and writes occur occasionally you can consider query caching. In this case most commonly used queries are cached in memory to make things faster.
Use an HTTP accelerator like Varnish if needed. Check out this discussion about Varnish and when it's necessary. If you are using nginx as a reverse proxy (we do) you can also instruct it to cache content.
Apart from the above techniques you can also utilise HTTP level caching using response headers like Cache-Control, Last-Modified, Expires etc. Here is an article on HTTP level caching.
TIP : This is not directly related to caching, but is worth mentioning. Use SSDs in your DB machines when possible. Random access is very fast and they offer huge performance boost to DBs like MongoDB. If you have good amount of main memory and use SSDs, you may not need to cache MongoDB initially.
As @lisol mentioned, Message Queues like RabbitMQ etc are used for inter-process communication. For example, one of your machines can queue emails to be sent and another machine will process that queue and send emails. This makes your machines independently scalable.
Hope this helps. I'll update this answer if I can think of a few more points.
Sandeep