11 Things you should know before deploying MongoDB in production

It has been more than a year since I wrote my last article. While building Hashnode I learned many interesting things, and have been experimenting a lot with MongoDB lately.

MongoDB is incredibly easy to setup, but when it comes to production you should be more careful. So, in this article I am going to outline a few things developers should know before deploying MongoDB in production.

Use SSDs

If you are renting a VM to deploy MongoDB, be sure to choose an SSD based VM. Here is a quote from MongoDB's docs:

Solid state drives (SSDs) can outperform spinning hard disks (HDDs) by 100 times or more for random workloads.

SSD based machines will cost more, but it's worth it.

Secure the Machine

Always make sure to close the public ports of the VM which hosts your MongoDB server. You may keep port 22 open (better change it to something else) in order to SSH into the remote server, but don't keep any other ports open.

If you are hosted on AWS/Azure etc, you can easily create a VPN and deploy your instance inside the network to restrict access from the outside world. Last year nearly 40,000 unsecure MongoDB servers were identified. The problem was that port 28017 was open and accessible over the web. So, make sure you are deploying your instances securely inside a VPN.

Password Protection

Even though your MongoDB server doesn't accept connections from the outside world, what if some malicious script gains access to your server? It could happen. So, set a username/password for your database and assign required permissions. This adds an additional layer of security, and you get to have peace of mind.

Replication

Don't deploy without replica sets, unless it's a prototype and you don't care about high availability. But if it's a serious production app, you should deploy a replica set.

In case you are not aware, a replica set consists of a primary MongoDB instance and several secondaries. The secondary instances replicate from the primary DB. If by chance the primary goes down, one of the secondary instances becomes primary and you won't have any downtime in your app. But beware - there can be a replication lag and the secondaries can give you stale data. That's why read operation on secondaries is turned off by default.

In general the primary MongoDB instance receives Read and Write requests from your app. But if you are interested in reading from secondaries you can do so by running rs.slaveOk() on any of the secondaries.

Drivers

Make use of connection pooling. And use your driver to connect to a particular replica set, and not an individual member. This is because when your primary goes down and a new primary is elected, your driver will automatically connect to the new primary.

Sharding

Sharding is a technique that lets MongoDB store data on different machines. This is useful if you have a huge data set and need high throughput. But I think adding shards in the beginning can be premature in most cases. Rather you should wait and see if your app is hitting any performance bottlenecks, and then you can take a decision.

Backups

If you are using replica sets, taking backup is easy. If the dataset is small/medium you can run mongodump on one of the secondary instances and upload the dump to some cloud storage solution. As the data size increases, mongodump can become less performant and in that case you can go for a file system snapshot.

Always remember replication is not same as backups. Replica sets are used for high availability, and not for backups. Read the following discussions to know more about replication and backup strategies.

Storage Engines

MongoDB recently added WiredTiger storage engine. Storage Engine is the part of the DBMS which determines how the data is going to be stored. If you are expecting a lot of concurrent writes, WiredTiger may be a good solution as it doesn't acquire lock on collection level. So, as always you need to analyse and choose the right storage engine for your deployment.

RAM

MongoDB caches frequently accessed items in RAM so that you can get good performance. So, make sure your machine has sufficient amount of RAM. More RAM means less page fault and better performance.

Journaling

Turn on Journaling in production. This ensures that MongoDB can recover write operations that were added to the journal but not data files due to a crash or some kind of system failure.

Indexing

If your app is write intensive you should be extra careful while creating indexes. According to MongoDB docs :

If a write operation modifies an indexed field, MongoDB updates all indexes that have the modified field as a key.

So, be careful while choosing indexes as it may affect your DB performance. Here is a nice FAQ on Indexing.

Do check out Production Checklist by MongoDB. I am sure there are more pointers for production MongoDB deployments, but I think these are the most important ones.

Have some feedback or suggestions? Feel free to share in comments below.

Comments (3)

Vasan Subramanian's photo

Regarding the storage: In addition to SSDs, you should probably consider IOPs and the effect of Local disks too, if your frequency of reads/writes is high.

Persistent Storage

Normally, one would use persistent storage. These drives are network-attached, and not connected to the physical hardware of your server. On AWS, this is called EBS and on Google Compute Engine, this is called Persistent Disks. If you terminate the server or the server fails, the drive remains and data is not lost.

But since it is network-attached, the performance is bad. For example, a 100GB disk on EBS can give only upto 300 IOPs on AWS, with similar numbers on Google Compute Engine. That's the price you pay for a more reliable storage.

In AWS, you can use something called Provisioned IOPs to increase performance, but the cost is prohibitive. In GCE, the only way to get better IOPs is to increase the disk size, so, even if you need only 100GB, you'll have to create a server with 1000GB to get a decent 3000 IOPs.

Local Storage

In contrast, local storage gives 10 to 20 times the performance (I have gotten upto 10,000 IOPs). These are SSDs attached to the hardware where your VM runs, so accessing it doesn't need the network. That's why they're fast. That's also why they're not persistent, but why do we care? We anyway are going to run MongoDB in a Replica Set, thus managing the reliability of storage ourselves.

A MongoDB cluster doesn't need persistent storage. Local storage is ideal.

In AWS, this kind of storage is called Ephemeral storage, but it's not available or of limited size on most modern instance types. No good. In GCE, it's called Local SSD, but it comes only in increments of 375GB, costing $80 per month. Too expensive for me.

That's why I like Digital Ocean to run my MongoDB cluster. Storage is always local and it's inexpensive!

Marcos Bérgamo's photo

Awesome post!

I just have some tips to adding. When you really close your network with database + app adding an extra layer of protection with password are maybe redundant.

You database is only access by your private network, no internet connections are available (particularity, I prefer accessing the database instance just by accessing other secure instance and then accessing that).

I'm not entire sure about the 3.0 version of mongodb, but the previous versions, the authentication mechanism is ver simple and not much efficient.

Shreyansh Pandey's photo

And we come to this topic again! :P I swear, I love this!