Things you should keep in mind while designing database schema for your application
So let's talk about designing a database schema, there are actually two major problems beginners mostly do while designing.
The first and very common problem is not dividing the table into multiple table, instead using one common table to have lots of data. Which actually cause few problems since our table gonna have huge amount of data, that could actually make data read/write operations to pretty slow because MySql generally does linear search if data is not indexed. It also make the management of data little harder because we have made it complete mess xP
The second again very common problem is diving the schema into so many tables which could actually make your queries little faster but not in every case because we're actually having lots of table which mean have to apply lots of joins while retrieving data, that could actually make your queries slower and little bit complex too. Other problem is having management issue because of so many tables and little bit of wastage of storage too.
There is nothing right and wrong about design it's all about good design so we'll discuss few tips and tricks to design a good database schema.
Let's decide how many table your database should roughly have!
There is very simple approach to roughly decide, how many tables you should have is to total number of unique entity your application gonna have. Number of tables could be more than that too while normalizing the database schema, we'll discuss further in the post how to do that too, and number of tables could be less than the unique number of entities too, so you guys are thinking how, right?
so let's take a simple example to understand this scenario, suppose you have entities like employee, supervisor or manager. Here we have two entities but there is no need to make the separate tables for both the entities because both are the employee to a company with just different privileges so we can just have a self relation to represent both the entities with just one table.
So the thumb rule for roughly deciding number of tables is
# of table = # of unique entities your application have
if you aren't familiar with the basics of MySql then I would recommend to have a look at this Basics of MySql
When to divide the table into multiple tables and why?
let's discuss the theoretical approach first, wait theoretical approach could little bit tedious to understand so let's discuss the practical approach instead which is easy to understand too.
- Whenever you feel that you're gonna have
redundant
values in any of your column that directly indicates that your design is not good enough and you should separate that column from the current table so that it won't have redundant values.
why: because this leads to few problems like updating a common value having multiple records, suppose you have records something like this
USER
╔══════╦══════╦════════════╗
║ id ║ name ║ department ║
╠══════╬══════╬════════════╣
║ 1 ║ Alex ║ Eng ║
║ 2 ║ Bob ║ Design ║
║ 3 ║ Ada ║ Eng ║
╚══════╩══════╩════════════╝
so here you can see we have two records having same department Eng
but what if you want to change the name of department from Eng
to Engineering
so you have to update both the rows, query will look something like this:
UPDATE USER SET department='Engineering' WHERE department='Eng';
this query will update both the record which means if we have N records then query will take around O(N) to update all the records but does it really need to run on O(N)? no. We can solve this problem by separation of table and will run in O(1) which will be fast enough, right! let's see how to do that
solution: so we are going to make a separate table for department
which will have one to many
relationship with the user
.
USER DEPARTMENT
╔══════╦══════╦═══════════════╗ ╔═════╦═══════╗
║ id ║ name ║ department_id ║ ║ id ║ name ║
╠══════╬══════╬═══════════════╣ ╠═════╬═══════╣
║ 1 ║ Alex ║ 1 ║ ║ 1 ║ Eng ║
║ 2 ║ Bob ║ 2 ║ ║ 2 ║ Design║
║ 3 ║ Ada ║ 1 ║ ╚═════╩═══════╝
╚══════╩══════╩═══════════════╝
here you can see, we have made department
as separate entity have linked with the user
with one to many
relationship, so now if we want to change the name of department from Eng
to Engineering
then query will look something like this:
UPDATE DEPARTMENT SET name='Engineering' WHERE id=1;
here this query will just take O(1) operation because it is just going to update one record only.
- Whenever you feel that you're gonna have
null
in some any of your column then again it directly indicates that your database schema is not good enough because any column should not containnull
values whatever may happen, but based on your use case if you are gettingnull
values in some your rows then you should separate that column to a new table.
why: there are actually two main reasons to avoid null values. First is you can't index
that column if it contains null
values, so this is the major concern because it makes force your queries for full search.
USER
╔══════╦══════╦═══════════════╗
║ id ║ name ║ certificate ║
╠══════╬══════╬═══════════════╣
║ 1 ║ Alex ║ Ruby Adv ║
║ 2 ║ Bob ║ NULL ║
║ 3 ║ Ada ║ React Basics ║
║ 4 ║ Sheva║ Ruby Adv ║
╚══════╩══════╩═══════════════╝
here our certificate
column contain null
values so you can't index this column, so problem occurs if you want all the record where certificate is Ruby Adv
SELECT * FROM USER WHERE certificate='Ruby Adv';
so this query actually gonna take O(# of records in table)
because it'll do complete search.
solution: so we are going to make a separate table for certificates
having one to many
relationship with user
USER CERTIFICATES
╔══════╦══════╗ ╔═════════╦══════════════╗
║ id ║ name ║ ║ user_id ║ name ║
╠══════╬══════╣ ╠═════════╬══════════════╣
║ 1 ║ Alex ║ ║ 1 ║ Ruby Adv ║
║ 2 ║ Bob ║ ║ 4 ║ React Basics ║
║ 3 ║ Ada ║ ║ 3 ║ Ruby Adv ║
║ 4 ║ Sheva║ ╚═════════╩══════════════╝
╚══════╩══════╝
here you can see we have made the certificates
as a separate table so now there is no null
values so can easily index CERTIFICATES.name
column and query will use that index for binary search
, so let's have a look at query
SELECT USER.name FROM USER AS U, CERTIFICATES AS C WHERE C.name='Ruby Adv' AND C.user_id=U.id;
so now this query will use Binary Search
to search the records instead complete search and will work in O(log(# of record in table))
.
why: Another reason is wasting of storage because if your column contains null value which means it stores nothing but if you have defined your column as CHAR
or INT
or something, it'll use the space you have defined for each record like if you have define CHAR[100]
then for every row, it'll gonna use this amount of memory whether it contains some value or not, even in case of VARCHAR
which actually takes the storage based on the length of value stored in it. It is also gonna take at least 1 Byte
of memory to just store the length because it is kinda mandatory, VARCHAR
always use some extra memory to store the length of string stored in it.
solution: Separate that column and make a new table out of it so it'll just contain the values for the rows that are actually have to.
- Whenever you feel that you're gonna have multiple values in any of your single column, which means you want to store multiple id's or string in a single column whatever reason may be but this is not a good sign for a good design. So let me tell you what people generally does when they have to store multiple values in single column they make that column as a
VARCHAR
and stores thecomma separated
strings.
why: This actually leads to a problem where you can't fetch all the records from that column in just one go, which means you have to write multiple queries to solve this which is not that efficient.
USER PRODUCTS
╔══════╦══════╦═════════╗ ╔═════════╦══════════════╗
║ id ║ name ║ products║ ║ id ║ name ║
╠══════╬══════╬═════════╣ ╠═════════╬══════════════╣
║ 1 ║ Alex ║ 1 ║ ║ 1 ║ Mouse ║
║ 2 ║ Bob ║ 1,2 ║ ║ 2 ║ Pen ║
║ 3 ║ Ada ║ 2 ║ ║ 3 ║ Mobile ║
║ 4 ║ Sheva║ 3 ║ ╚═════════╩══════════════╝
╚══════╩══════╩═════════╝
here you can see we have multiple values in products
column which just represents what are the products a user have. Here Bob
have two products 1,2
if you have to fetch all the products Bob
has, it is not possible with one query because products
contain space separated string so you have to split that.
$row = mysqli_query("SELECT products FROM USER WHERE id=2", $con);
$p = array_map('intval', explode(',', $row['product']));
$p = implode("','",$p);
$products = mysqli_query("SELECT name FROM PRODUCTS WHERE id IN(${$P})", $con);
so here we have to break the query in two parts because we have stored comma separated string in the column, let's talk how can we solve that.
solution: So instead of storing multiple values in a single column, we'll create a separate new table for the same so that we can store multiple values as separate rows in this table and linked to user
with one to many
relationship
USER PRODUCTS U_P_ASSOCIATION
╔══════╦══════╗ ╔═════════╦══════════════╗ ╔═════╦═══════╗
║ id ║ name ║ ║ id ║ name ║ ║u_id ║ p_id ║
╠══════╬══════╣ ╠═════════╬══════════════╣ ╠═════╬═══════╣
║ 1 ║ Alex ║ ║ 1 ║ Mouse ║ ║ 1 ║ 1 ║
║ 2 ║ Bob ║ ║ 2 ║ Pen ║ ║ 2 ║ 1 ║
║ 3 ║ Ada ║ ║ 3 ║ Mobile ║ ║ 2 ║ 2 ║
║ 4 ║ Sheva║ ╚═════════╩══════════════╝ ║ 3 ║ 2 ║
╚══════╩══════╝ ║ 4 ║ 3 ║
╚═════╩═══════╝
so here we have created one more table just to store the association
between user
and product
to define one to many
relationship and avoid multiple values in a single column, so now let's try to fetch all the products Bob
have
SELECT P.name FROM USER AS U, PRODUCT AS P, U_P_ASSOCIATION AS A WHERE U.id=2 AND A.u_id=U.id AND P.id=A.p_id;
so now we can fetch all the products Bob
have in just one query, which is easy to maintain and faster than our previous solution too.
- Whenever you have to define some
constants
or you have some finite number of values which is to be used in different tables, you should never use those values in place itself. Always make a separate table to the finite number of records or may be forconstants
then reference those into different tables.
why: if you actually store those constants
in records itself as value then it actually leads to a very similar problem we have seen at first point, redundancy
and because of redundancy management becomes tough like when you have to update those constants, so you have to update it everywhere it is used, which is actually gonna take so much time.
solution: Store them in a separate table and use references so that whenever you have to update the values it'll be super easy, you just need to update it only one place and it'll automatically take care of rest of the things.
How and why to maintain database schema?
You should always keep the entity relationship
diagram of you database schema somewhere, which actually helps you to easily understand the whole database in small amount of time and other members of the team too.
why: let's suppose you have to make changes in your database schema after a long time then it'll very difficult to know the table and attributes of the table, basically if there is no documentation of your database schema then it'll going to be pretty hard to update the schema along with the time. Having a ER diagram
makes the updation process handy ; )
How: there are actually various way to maintain the documentation about your schema like ER diagram, Tabular form etc…
it depends on you whatever makes more sense to you but always keep the documentation of database schema to make the whole updation process smooth!