Thread

Sandeep Panda

co-founder, Hashnode

Dec 20, 2015

How do you backup your MongoDB instances in production deployments?

I have a 3 member replica set with one primary and two secondaries. My current strategy for backup is : I have setup a cron job which uses mongodump to backup one of the secondaries and uploads to Amazon S3.

What strategy do you use? Would love to know suggestions/best practices to follow for backups.

#mongodb

Responses(5)

Vasan Subramanian

Previous generation techie

Dec 22, 2015

I use periodic mongodump, as well as a copy of the transaction logs so that the DB can be re-created from scratch if required. But that's my special use case, it may not be good for you. In production workloads such as Hashnode itself, you'll need to think of the following:

Availability: you need to be resilient to single hardware failure.
Disaster: you need to be resilient to an entire data center going down
Human error: someone dropped a collection by mistake

In my experience, I have had to use backups just a couple of times due to hardware failure, but many a time due to human error, including application bugs.

Availability

MongoDB replication takes care of this. If the primary goes down, MongoDB automatically switches over to one of the replicas. You will need monitoring mechanisms to alert you of failures so that you can quickly bring back another node as a replica.

But you can't rely on replication to mitigate disasters or human errors -- if someone drops a collection by mistake, the collection will disappear in the replicas as well!

Disaster

If you can't afford to lose any data, even on disasters, you should be thinking of having replicas across availability zones or even regions. Typically, availability zones within a region are isolated enough such that a disaster in one zone will not affect the other. But a tsunami can destroy an entire region.

The problem with cross-region replication is that it's going to be slow. If you think you really need it, give sufficient attention to the write concern that you use, and test this out properly before you put this in production.

If you are OK to lose some data, say one day's worth on the unlikely event such as a disaster, then it's OK to not have special handling for this, instead, just rely on the periodic dump as described below.

Human Errors

To recover from human errors, you must have the ability to do Point-in-time-recovery (PITR). The simplest form is a periodic dump as you are doing, but keep the old dumps. This could be mongodump (smaller) or the file-system based snapshot (faster) if your data center provider supports it. And store it on a reliable storage system such as S3 (especially if you are using this mechanism to handle disasters as well).

I have, in the past, kept one dump for every day of the week (ie, the cron-jobs would overwrite the Monday backup every Monday), and one for the 1st of every month. If you have enough storage, you could easily keep every dump rather than do the rotation.

If your data is big and it takes a lot of time and/or resources to create that dump, you need to think of incremental backups. You could use the oplog yourself, or indirectly via tools such as Tayra. I have used a similar technique with PostreSQL's Write Ahead Logs (WAL) to implement PITR, but not with MongoDB.

But remember that PITR using oplog / WAL is quite complex to set up. It gets even more complex if you have to use S3. The restore is not quite straightforward either. Remember also that you'll probably be using PITR the most often, so the simpler it is, the better.

I suggest you start with mongodump or file-system based snapshot on one of the secondaries as you are doing, till a point where it starts affecting performance. You could think of incremental backups a that point in time.

Take a look at Backup vs. Replication for an even more detailed discussion.

Alice Zoë Bevan–McGregor

Interesting point, touching on Postgres. I have, historically, successfully deployed Postgres DB clusters without any permanent primary storage: the entire thing would run on an ephemeral AWS VM (reboot = reset to stock) w/ automatic RAID of the two local bulk partitions (dom0 local storage). The WAL was archived to S3 every minute (once a week snapshots) and the VM, on startup, would identify if it was alone. If so, restore from that snapshot and apply the per-minute logs until caught up, if not, recover from another node then join. If alone, after recovery, spin up sibling nodes. RPO (recovery point objective) of one minute. Recovery time objective/RTO of ~10 minutes to spin up an entirely new cluster.

On the MongoDB side, I can highly recommend running a local replica as a backup mechanism, and additionally replicating a delayed replica from that local live copy. This combines disaster recovery (latest live data) and the ability to recover from user error (e.g. user deletes everything), and, you could always actually utilize the in-office database to service the application from the office (vs. hosting provider) during recovery.

Tito George

Developer

Dec 21, 2015

mongodump is not a recommend way if your data size is huge say >100 gb. It takes a long time to backup and to restore even more. Best way is to take file system snap shot. Refer this guide - docs.mongodb.org/v3.0/tutorial/backup-with-filesy…

For sharded cluster refer this guide - docs.mongodb.org/v3.0/tutorial/backup-sharded-clu…

We use mongodb.com/cloud is a service from MongoDB inc which provides backup, automation and monitoring.

I can highly recommend running MongoDB on XFS on ZFS (best of both worlds; there are some downsides to running MongoDB directly on ZFS…), using ZFS zvol snapshots for backups. It's super effective.

For details, see: serverfault.com/questions/583688/mongodb-and-zfs-… and percona.com/blog/2019/04/29/zfs-for-mongodb-backu…

Lars

German developer, who likes to play with everything that comes in his way

Dec 20, 2015

I personally would do something similar. Next week I will start to write a service for my home automation where a part of it will be to backup the databases. Amybe then I will have another solution but currently I would also use a cron job.

Jan Vladimir Mostert

Idea Incubator

Dec 20, 2015

Something which I've seen being used on PostgreSQL databases which could also be used on MongoDB with a little bit of work ... RabbitMQ is added in front of the database, any insert / update / delete that is done on the primary DB is thrown on a fanout exchange which fans it out to other databases where the same operation is then applied.

So if you have 3 queues to fanout to, any operation that is done on the primary database is replicated via RabbitMQ to 3 other databases without performance penalties. If you need to do maintenance on a database, simply switch to a backup copy of the DB, any updates you're missing will be queued on RabbitMQ and be applied as soon as the DB becomes available.

So this is effectively backup via replication.

Biplab Malakar

Senior Software Engineer, JavaScript Developer, MEAN Developer, Node.js Developer, MERN Developer, Hybrid Mobile App Developer and ML Develo

Oct 27, 2018

I use 3 approach for backup:

Replica set.
Import our database to our local mongodb instance after every 12-13 hours. 3 . Also mlab have backup management for our project.

Search Hashnode

How do you backup your MongoDB instances in production deployments?

Responses(5)

Recent threads