It has been more than a year since I wrote my last article. While building Hashnode I learned many interesting things, and have been experimenting a lot with MongoDB lately.
MongoDB is incredibly easy to setup, but when it comes to production you should be more careful. So, in this article I am going to outline a few things developers should know before deploying MongoDB in production.
If you are renting a VM to deploy MongoDB, be sure to choose an SSD based VM. Here is a quote from MongoDB's docs:
Solid state drives (SSDs) can outperform spinning hard disks (HDDs) by 100 times or more for random workloads.
SSD based machines will cost more, but it's worth it.
Secure the Machine
Always make sure to close the public ports of the VM which hosts your MongoDB server. You may keep port 22 open (better change it to something else) in order to SSH into the remote server, but don't keep any other ports open.
If you are hosted on AWS/Azure etc, you can easily create a VPN and deploy your instance inside the network to restrict access from the outside world. Last year nearly 40,000 unsecure MongoDB servers were identified. The problem was that port 28017 was open and accessible over the web. So, make sure you are deploying your instances securely inside a VPN.
Even though your MongoDB server doesn't accept connections from the outside world, what if some malicious script gains access to your server? It could happen. So, set a username/password for your database and assign required permissions. This adds an additional layer of security, and you get to have peace of mind.
Don't deploy without replica sets, unless it's a prototype and you don't care about high availability. But if it's a serious production app, you should deploy a replica set.
In case you are not aware, a replica set consists of a primary MongoDB instance and several secondaries. The secondary instances replicate from the primary DB. If by chance the primary goes down, one of the secondary instances becomes primary and you won't have any downtime in your app. But beware - there can be a replication lag and the secondaries can give you stale data. That's why read operation on secondaries is turned off by default.
In general the primary MongoDB instance receives Read and Write requests from your app. But if you are interested in reading from secondaries you can do so by running
rs.slaveOk() on any of the secondaries.
Make use of connection pooling. And use your driver to connect to a particular replica set, and not an individual member. This is because when your primary goes down and a new primary is elected, your driver will automatically connect to the new primary.
Sharding is a technique that lets MongoDB store data on different machines. This is useful if you have a huge data set and need high throughput. But I think adding shards in the beginning can be premature in most cases. Rather you should wait and see if your app is hitting any performance bottlenecks, and then you can take a decision.
If you are using replica sets, taking backup is easy. If the dataset is small/medium you can run
mongodump on one of the secondary instances and upload the dump to some cloud storage solution. As the data size increases,
mongodump can become less performant and in that case you can go for a file system snapshot.
Always remember replication is not same as backups. Replica sets are used for high availability, and not for backups. Read the following discussions to know more about replication and backup strategies.
MongoDB recently added WiredTiger storage engine. Storage Engine is the part of the DBMS which determines how the data is going to be stored. If you are expecting a lot of concurrent writes, WiredTiger may be a good solution as it doesn't acquire lock on collection level. So, as always you need to analyse and choose the right storage engine for your deployment.
MongoDB caches frequently accessed items in RAM so that you can get good performance. So, make sure your machine has sufficient amount of RAM. More RAM means less page fault and better performance.
Turn on Journaling in production. This ensures that MongoDB can recover write operations that were added to the journal but not data files due to a crash or some kind of system failure.
If your app is write intensive you should be extra careful while creating indexes. According to MongoDB docs :
If a write operation modifies an indexed field, MongoDB updates all indexes that have the modified field as a key.
So, be careful while choosing indexes as it may affect your DB performance. Here is a nice FAQ on Indexing.
Do check out Production Checklist by MongoDB. I am sure there are more pointers for production MongoDB deployments, but I think these are the most important ones.
Have some feedback or suggestions? Feel free to share in comments below.