@tschellenbach
Nothing here yet.
Nothing here yet.
No blogs yet.
First question I'm answering on Hashnode. I'm the author of Stream-Framework and run getstream.io for a living. I dream about feeds :) First of all I would recommend studying some of the great articles about feeds. I've attached my favorites at the bottom of this answer. Naming conventions Start by reading this spec: http://activitystrea.ms/specs/atom/1.0/ There is also an updated W3 spec coming up that makes some small changes. Functionality A lot of the design decisions depend on what functionality you need: Notification/Realtime (Listen to changes in realtime) Aggregation (Ben and 3 people like your picture) Ranked Feeds (Sorting on more than just recency) Personalized feeds (Unique sorting logic based on the user. Often done using machine learning and analytics) Fanout on read/ Fanout on write Most feed systems either use fanout on read or fanout on write. Fanout on write is the more common choice. Instagram, Twitter, Stream all started out that way. Fanout on read is easier to build, but getting a good 99th percentile load time is really tricky. This paper is a great introduction into the tradeoffs between those 2 approaches: Yahoo Research Paper For most apps you will need to use a combination of push and pull at some point. That's what we do at getstream.io Storage and message brokers First of all you will need to pick a message broker for your fanout on write. My recommendation is RabbitMQ for mid size projects. If you have more time available Kafka is a great option. It scales much better than RabbitMQ. Unfortunately it is still really hard to use and maintain. For storing the activity feeds I would recommend Cassandra. Many people start out with Redis, but it is very easy to run into limitations. Especially if you want to do aggregated feeds or otherwise need to store a lot of things in memory. Redis can get expensive very quickly. Cassandra is what Stream and Instagram use. If you're building support for fanout on read I recommend either: Postgres, Redis or ElasticSearch. ElasticSearch can be tuned to do fanout on read very efficiently. (see this post Ranked feeds with ES ) Redis is a good option for fanout on read if you use ZUnionStore. Postgres eventually breaks for a fanout on read approach. But you can tweak it to last for quite some time. (a really old HighScalability article about our approach at Fashiolista.) Realtime Faye is a great open source project. In terms of hosted solutions PubNub and Pusher are awesome options. Caching/Locking If you need to cache some data simply use Redis. Redis also has a great locking implementation if you need to lock before writing to certain feeds. Try to avoid locking at all cost though. Removing activities/Content checks Eventually your users will post spam and inappropriate content. This is tricky to deal with as removing activities from all the feeds can take some time. Here's the common solution for this issue: Set a flag on the activity (ie. inappropriate=true, privacy=me) Filter these activities while reading the feed In the background run the delete command Priority queues You will want to have a higher priority for follows and direct inserts. Otherwise your users will be waiting for their feed to show :) You task queueing system will have to support some level of priorities. This is easy to do with tools like Celery . Design Resources https://getstream.io/activity-feed-design/ https://getstream.io/based-feed-ui-kit-sketch/ https://getstream.io/blog/13-tips-for-a-highly-engaging-news-feed/ Articles Twitter 2013 Redis based Etsy feed scaling LinkedIn ranked feeds Facebook history Activity stream specification FriendFeed approach Yahoo Research Paper Twitter’s approach Cassandra at Instagram Relevancy at Etsy Zite architecture overview Ranked feeds with ES Riak at Xing - by Dr. Stefan Kaes & Sebastian Röbke Riak and Scala at Yammer My projects Stream-Framework getstream.io