Comment by Shreyansh Pandey on "Does anyone here have any experience with MariaDB?"

I'll edit this to provide you a little more context.

I have been using a two-node cluster for a highly-available MariaDB data tier. The configuration of the system is:

16 GB RAM
Xeon with 8 vCPUs

It's a simple, M/S/S replication with a redundant Slave for any immediate data access and failover. So, that's three machines, actually, but yeah...

Some of the features I find are crazy:

A plethora of storage options - you can use a LOT of storage engines with MariaDB. And all work flawlessly. For a small project, it doesn't matter, but for big, production apps it surely does.
Kill a query by its ID - You can kill a query by its ID without dropping a connection; this helps mitigate a stuck, or a forever-ending query.
Clustering improves performance - Unlike with MySQL, you see an instant performance boost when you cluster a couple of machines together.
Multi source masters - You can have a couple of different masters to replicate from; a use case for this might be JOIN statements and the likes.

And as always, if you need any help with setting it up or with anything, I am always there. :)

@robertsx, I have updated it; have a look.

@labsvisua, I'm a different Robert than the starter ;)

I couldn't be more sorry. I thought I mentioned the wrong one. I am so sorry!

@labsvisua, I have several questions, hope you don't mind being bombarded with questions ... 1: have you made use of any of the NoSQL functionality options that were added to MariaDB - what was your experience? 2: does your cluster do Synchronous or Asynchronous writes to your slave machines? 3: are you using Galera Cluster to cluster MariaDB or something else? 4: in your Master-Slave-Slave setup, have you dealt with disaster recovery? 5: if taking down master and slave takes over as the new master, how long does master take to catch up with the slave again if you take it down for let's say an hour?

@JanVladimirMostert: Who doesn't love questions! ;) Let's go in order.

1) We didn't need any NoSQL functionality from MariaDB. One of our engineers definitely popped up with that idea, but we had to reject it for two reasons: i) using something for a functionality it isn't native to usually ends in low performance; ii) MariaDB was a tier-II database, i.e., a persistent store. The data in a tier-II is constant over a period of time. Since MariaDB gives excellent read speeds, we leverages it for that.

2) Since the internal data network was on a high throughput machine, and the internal latency was a single-digit millisecond, we preferred synchronous. This gave us the option to have atomic data writes with no loss at all. I'd say that you use async. replication only if you have sync. replication between two masters, or any architecture where you have at least one copy of the latest data.

3) Yeah, Galera cluster was the clustering option of choice; this was due to a number of reasons. For starters, it worked out-of-the-box. Secondly, we had synchronous data replication. On a multicore machine, we had the option to leverage multithreading. And particularly because it took less than 2 minutes to spin up a node. No need to note the log offsets, and whatnot.

4) Sadly, yes. We did deal with disaster recovery. On a Sunday afternoon, I was checking the error logs of the application and that's when I noticed something strange: there were 6k responses with the 50x errors. I saw the error logs, and lo and behold: Maria wasn't writing anything to the database. Luckily for us, it had only been 2-3 minutes, so we quickly fired up a new master, replicating the data from the old ones and everything worked out just fine.

5) This highly depends on how much data there is. We had R/W close to 10/300 (/second), so it would take us somewhere between 5-10 minutes in provisioning, copying, and deploying a new node. This was because we had a Redis metacache layer in between. So, if master went down, all of the data will go into Redis' cache, and since it's cache, we were able to quickly recover.

I hope this answers it! If you have any more questions, feel free to ask them! :)

The NOSQL option would probably be great for dumping schemaless data to the DB where you don't need to do a lot of reads? How would performance compare in general if you instead just write it to a TEXT column and save keywords in a separate table?

Which storage engine do you use, is InnoDB still the preferred engine on MariaDB? On MySQL it's about the only useful one unless you go NDB for the sake of clustering.

The rest of your answers are great, thanks for the insightful feedback :-)

For dumping logs, I'd recommend you to use Aerospike. A friend of mine used it with some really great feedback! :)

For dumping schemaless data, I'd recommend you to use a schemaless database. The reason for this is simple: if your data grows, you'll need sharding. Now, sharding is not the same as replication. So, Mongo or something like that should be better. But yeah, if you don't have much data, I guess using the COLUMN JSON format, you can get up to a good place with MariaDB. I haven't used it this way, so I can't realy comment; this is just some theory.

Yeah, choosing storage engines is a nightmare. Right now, we're on InnoDB, and it's running really smoothly. So, yeah! I wouldn't be able to comment on any other database engines as I haven't really researched on them well enough.

Anytime! ;)

Appreciate the awesome explanation. Thanks @labsvisua. :)

@robertsx, anytime! ;)

Search Hashnode