I'm developing an cloud based accounting system using Django (Python). Many people asked to us me multiple databases for this kind of application. Django does not support cross database relation. I'm wondering how big companies like Google, Facebook ..etc manage their data across multiple database. I'm using MySQL as database. Thanks in advance.
Gergely Polonkai
You have to believe in things that are not true. How else would they become?
What's your use-case? Lots of reading or lots of writing or both? And by lots, what do you consider big?
Running 20k active users can be achieved easily on a single DB server. Note that you can easily increase the connection limit to 500 or a 1000 and if you have enough RAM, even 10k which means your DB is handling 10k concurrent transactions (assuming your OS can handle that many threads, 500-1000 is more typical).
In heavy read situations, just adding caching allows you to stay on a single DB server and then simply clustering the caching servers allows almost infinite read scaling. Having a master slave setup allows you to scale reads out to slaves as well.
If you're doing row-locking updates and a lot of users are updating the same data, your scalability will be bottlenecked by users waiting for each others' updates, so no amount of tricks will make it scale unless you prevent users having to wait for each others' updates.
Unless you're guaranteed a million concurrent users from day 1, MySQL scales well enough that you can worry about scaling later.
If you're flexible with DBs, also have a look at MariaDB, it's a drop-in replacement for MySQL that performs much better (at least it did, haven't been following it for a while).
Also have a look at the book High Performance MySQL, sharding is literally the last option on the table given (no pun intended), not the first thing you should be worried about.
You can try vitess.io or MySQL Fabric both looks open source solution.
I would advise that if you know upfront you will have to handle high load and do things like db-sharding, you should not be using Python and Django.
Python is slow and Django has a lot of limitations that makes it not suitable for this kind of scenario.
For the database, learn to write raw SQL. It will make working with the database more pleasant. And consider using a compiled language for the server application.
Think about what you need from an ORM? I think the most useful thing a database api can provide is to automatically convert a row from a database to a struct. Depending on what language you use, there will be a library that does this.
Look at Vitess. They have awesome documentation, with a section dedicated on planning for scale.
If your engineering team knows how to operate Vitess, all the developers need to know is that. If your application is already complex enough, do a code review marathon and try to find anything that is against these “rules”.
(Excerpt from the site)