What is Postgres?
Postgres is a popular sql database that allows end users to insert data into table-like structures and retrieve it using ANSI-SQL.
Recently, I ended up with the use case of historic aggregation of numeric columns.
To explain the problem better, let's say we have a schema for a banking application. The schema contains balance column in the customer table. Now, we want to know sum of all your customer balances day-wise.
For e.g: what is the total balance of all your customers yesterday, today and the upcoming days. This historic aggregation, helps us to measure the growth of the deposited money over a period of time.
The obvious logical step to solve the problem would be having a periodic job to calculate the sum and insert it into a new table. But, setting up the cron job and maintaining a separate data pipeline is too much of a headache.
Luckily, postgres has an extension to schedule cron jobs called
With the help of
pg_cron, we can bring down the job of writing the code and deploying it as a service to writing few SQL queries.
Now that we have simplified the job, let's create a new table with the following columns to store our historic aggregations:
- calculated_at - timestamp of the calculated metrics
- metric_name - I've added this metric_name column, so that in future, if i want to add more metrics, I can use this table again
- metric_value - aggregated value.
CREATE TABLE historic_numeric_aggregation ( calculated_at TIMESTAMP WITH time zone, metric_name TEXT, metric_value NUMERIC )
The next step after creating the table is to populate the table with the calculated metrics value everyday. To do that, I'm creating a postgres function, which inserts the sum of balances of the customer to the historic_numeric_aggregation table.
CREATE OR replace FUNCTION aggregate_customer_balance_sum() returns void AS $$ BEGIN INSERT INTO historic_numeric_aggregation ( calculated_at, metric_name, metric_value ) VALUES ( CURRENT_TIMESTAMP, 'customer_sum_balance', ( SELECT sum(balance) FROM customers) ); END $$ language 'plpgsql';
So far we have created all the building blocks to calculate the metrics.
Let's schedule the aggregate_customer_balance_sum function to run every day morning at 10'O clock.
SELECT cron.schedule('customer_sum_aggregation', '0 10 * * *', 'select * from aggregate_customer_balance_sum()');
Voila, now we have a cron job to do the historic metrics aggregation.
Note: In this post, I've not explained how to enable the
pg_cron extenstion. Because, every flavour of postgres has its own way of installing and enabling it. I've followed AWS tutorial to enable pg_cron on our RDS cluster.
I'm part of Streak YC W22. We are building a neo-bank for teenagers. We are looking for curious person like you to strengthen our engineering team. As a bonus, you can learn from us and we can learn from you. If you are interested, please send your resume to
email@example.com or you can directly reach out to me on Twitter.