I was wondering what are the limitations of a SQL database, any ideas on this? Does NO-SQL solve them and what trade-offs do they make in turn.
it's an introduction to noSQL by martin fowler :) It helped me a lot to understand the differences :)
I'm not an expert but I've been reading about this topic a lot lately (as i'm completing a bunch of free courses from Mongo DB University atm). As Shailesh mentioned basically it comes down to whether you have more static data that has the same schema or if you have flexible data that will be written a lot, have multiple schemas, or have some frequently accessed data
Obviously there is a huge deal of variation but you want to use RDBMS for something like immigration tracking for your country. You want to make sure that each person has a passport number, entry date, visa expiration date, list of dependents etc. You want to make sure that every person in your database has each field, and you want to be able to easily join an immigration database with a visa extension database so that you can quickly track people who have been granted or denied a certain visa status. Also you want to be able to get analytics data faster, and that is helped by unified schema that the RDBMS forces on you.
Alternatively, you might want to use a NoSQL database (like the document database of MongoDB) to store data that will be accessed and changed frequently/ and/or is not essential to be complete or in the same schema. This would work well for something like a blog, as not all blog posts will have comments, hashtags, or pictures, but others do. You also will make frequent edits to comments, tags, and posts, and maybe upload pictures later, and it is faster to write to the json object of the blog post than it is to create new columns across the table that holds the posts in order to add a new field (such as a contributors field) to blog posts.
This is oversimplified but I hope it gets the basic ideas across. I welcome all comments, criticism and expansions upon this. Thanks!
Shailesh
architect
Great question! The answer has many factors I'll try to summarise here.
One of the limitations of a RDBMS (relational database) is the rigidity which comes from organising data into tables and relations. Not all data fits naturally into a tabular structure -- think of trees and graph structures in particular. Relational databases typically work around this limitation by modelling such data in normalised form with parent-child records. Essentially there are techniques which allow us to fit non-tabular data into a tabular, record-oriented format.
A consequence of this is that the schema (or structure) of all records in a table must be the same. If you've encountered tables with many empty columns for optional fields, it's a result of this rigidity. Tables with many empty columns waste a lot of space. Typically, we work around this by breaking up a record into sub-parts, creating a sub-table which holds the sub-fields and then do cross table joins when querying to create complete records. In a document oriented database (like MongoDB or ElasticSearch for example) each document is stored independent of the other and so there is no wastage of space for optional fields.
A second result is that schema changes are heavyweight. If you have even one record which needs a new field, you must add it to every record in the table. In a document oriented database, adding a new field to one document doesn't affect any of the other documents (it may however, increase the size of your index if you will use that field to search for documents). This makes schema changes lightweight and relatively easy to carry out.
The downside of NoSQL (not-only SQL) databases is that they are not as efficient for analytical queries. If you have tabular data and want to be able to query it effectively, an RDBMS will be pretty hard to beat. Second, since schema is not rigid across all records, you can end up with wildly divergent documents within the same logical group of records. Finally, RDBMS technology is a lot more mature and well-proven than some NoSQL databases. MongoDB in particular has had a rough ride over the last decade in proving its ability to scale up to handle heavy workloads. Cassandra, OTOH, has been built to scale from day one, but has much more limited query capabilities.
With RDBMSes like PostgreSQL now offering JSON as a first-class data-type with indexing support, the distinction between an RDBMS and a NoSQL DB is becoming more fuzzy. You could have a table with two columns, a key and a value, with the value storing a JSON-based document record which can be queried using SQL!
If you're interested in using an RDBMS for a NoSQL solution, JSON+Postgres can be very powerful. Helpful link to querying JSON in Postgres. schinckel.net/2014/05/25/querying-json-in-postgres
(PS: Since SQL is a query language and not a database technology, my answer here assumes that the term SQL database is interchangeable with RDBMS)