At our company we're building a platform that will need to send notifications to users.
Example of such notifications are:
We need to process these notifications and decide if:
Some of the notification should be send in "almost" real-time, but at the same time we like to aggregate them so we will not spam our users.
For example if a single question gets 10 likes at the same time, we don't want to send 10 emails, 10 push, and 10 notification in the Web UI. We would like to somehow aggregate them and at the same time be as "real-time" as possible.
What architecture, tools and strategy will you use to design such system? It should be fast and efficient as possible, support real-time push and aggregation of notifications.
Thanks a lot!
I accomplish this with a mix of Firebase, JS, PHP, MySQL and a cronjob.
Some notifications are passive - someone adding someone as a friend for example. These kinds of notifications I just push to the user via Firebase and JS. When someone adds someone as a friend, an entry gets written to the friendee's notifications path and Firebase being a realtime system, if the friendee is logged in, they'll see it right away. Notifications can be dismissed (deleted) of course.
If you wanted to make sure the user sees these kinds of notifications, you could also push the entry to the db (below) and whatever happens first, happens first. If the cronjob triggers, the notification gets rolled up and sent. If someone dismisses the notification in the website, delete the entry from the db as it no longer needs to be sent via the cronjob.
Other notifications are done via email - a new article is posted to a page someone follows, for example. In this case, the entry gets written to a MySQL db. As multiple articles across multiple pages can be posted in 1 day, nothing happens until the cronjob fires. The cronjob simply loops through the table once a day, checking if any emails need to be sent; consolidates the emails - I call this a rollup - and merges the emails into 1. So if 3 articles are posted, the user will get 1 email, not 3.
Hey there,
I'm a Developer Advocate over at Stream (getstream.io).
To answer on some components of a scalable notification system:
Fanout on read/ Fanout on write
Most feed systems either use fanout on read or fanout on write. Fanout on write is the more common choice. Instagram, Twitter, Stream all started out that way. Fanout on read is easier to build, but getting a good 99th percentile load time is really tricky.
This paper is a great introduction into the tradeoffs between those 2 approaches: Yahoo Research Paper
For most apps you will need to use a combination of push and pull at some point. That's what we do at getstream.io.
Storage and message brokers
First of all you will need to pick a message broker for your fanout on write. My recommendation is RabbitMQ for mid size projects. If you have more time available Kafka is a great option. It scales much better than RabbitMQ. Unfortunately it is still really hard to use and maintain.
For storing the activity feeds I would recommend Cassandra. Many people start out with Redis, but it is very easy to run into limitations. Especially if you want to do aggregated feeds or otherwise need to store a lot of things in memory. Redis can get expensive very quickly. Cassandra is what Stream and Instagram use.
If you're building support for fanout on read I recommend either: Postgres, Redis or ElasticSearch. ElasticSearch can be tuned to do fanout on read very efficiently. (see this post Ranked feeds with ES) Redis is a good option for fanout on read if you use ZUnionStore. Postgres eventually breaks for a fanout on read approach. But you can tweak it to last for quite some time. (a really old HighScalability article about the approach at Fashiolista.)
Realtime
Faye is a great open source project. In terms of hosted solutions PubNub and Pusher are awesome options.
Aggregation This Etsy presentation on feed scaling is a good place to check for aggregated feed design concerns: slideshare.net/danmckinley/etsy-activity-feeds-ar…
Caching/Locking
If you need to cache some data simply use Redis. Redis also has a great locking implementation if you need to lock before writing to certain feeds. Try to avoid locking at all cost though.
Removing activities/Content checks
Eventually your users will post spam and inappropriate content. This is tricky to deal with as removing activities from all the feeds can take some time. Here's the common solution for this issue:
Set a flag on the activity (ie. inappropriate=true, privacy=me) Filter these activities while reading the feed In the background run the delete command
Priority queues
You will want to have a higher priority for follows and direct inserts. Otherwise your users will be waiting for their feed to show :)
You task queueing system will have to support some level of priorities. This is easy to do with tools like Celery.
Stream API
At Stream, we help devs build scalable newsfeeds by abstracting away the core feed infrastructure. So, one way to build a notification feed similar to Facebook is to get started with us. We have feed types for Notification, and also for Aggregated, to address that.
We have client and framework libraries for Python, Django, Node/JavaScript (which handles client-side real-time for Notification feeds), PHP, Ruby, Rails, Meteor, Java, Go, and more.
If you are looking for a completely open sourced route, also check out Stream-Framework. Our founder, Thierry, is the creator.