@erikaugust
Developer Advocate at Stream
Nothing here yet.
Nothing here yet.
No blogs yet.
Hey there, I'm a Developer Advocate over at Stream ( getstream.io ). To answer on some components of a scalable notification system: Fanout on read/ Fanout on write Most feed systems either use fanout on read or fanout on write. Fanout on write is the more common choice. Instagram, Twitter, Stream all started out that way. Fanout on read is easier to build, but getting a good 99th percentile load time is really tricky. This paper is a great introduction into the tradeoffs between those 2 approaches: Yahoo Research Paper For most apps you will need to use a combination of push and pull at some point. That's what we do at getstream.io. Storage and message brokers First of all you will need to pick a message broker for your fanout on write. My recommendation is RabbitMQ for mid size projects. If you have more time available Kafka is a great option. It scales much better than RabbitMQ. Unfortunately it is still really hard to use and maintain. For storing the activity feeds I would recommend Cassandra. Many people start out with Redis, but it is very easy to run into limitations. Especially if you want to do aggregated feeds or otherwise need to store a lot of things in memory. Redis can get expensive very quickly. Cassandra is what Stream and Instagram use. If you're building support for fanout on read I recommend either: Postgres, Redis or ElasticSearch. ElasticSearch can be tuned to do fanout on read very efficiently. (see this post Ranked feeds with ES ) Redis is a good option for fanout on read if you use ZUnionStore. Postgres eventually breaks for a fanout on read approach. But you can tweak it to last for quite some time. ( a really old HighScalability article about the approach at Fashiolista. ) Realtime Faye is a great open source project. In terms of hosted solutions PubNub and Pusher are awesome options. Aggregation This Etsy presentation on feed scaling is a good place to check for aggregated feed design concerns: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture/ Caching/Locking If you need to cache some data simply use Redis. Redis also has a great locking implementation if you need to lock before writing to certain feeds. Try to avoid locking at all cost though. Removing activities/Content checks Eventually your users will post spam and inappropriate content. This is tricky to deal with as removing activities from all the feeds can take some time. Here's the common solution for this issue: Set a flag on the activity (ie. inappropriate=true, privacy=me) Filter these activities while reading the feed In the background run the delete command Priority queues You will want to have a higher priority for follows and direct inserts. Otherwise your users will be waiting for their feed to show :) You task queueing system will have to support some level of priorities. This is easy to do with tools like Celery. Stream API At Stream, we help devs build scalable newsfeeds by abstracting away the core feed infrastructure. So, one way to build a notification feed similar to Facebook is to get started with us . We have feed types for Notification, and also for Aggregated, to address that. We have client and framework libraries for Python, Django, Node/JavaScript (which handles client-side real-time for Notification feeds), PHP, Ruby, Rails, Meteor, Java, Go, and more. If you are looking for a completely open sourced route, also check out Stream-Framework . Our founder, Thierry, is the creator.