Comment by Alexander Daw on "Architecture : How would you go about building an activity feed like Facebook?"

I have built a few of these, here's what has worked for me:

Tools:

Hbase
Elastic Search
Hadoop Map Reduce

Methods -Chronological feed For timeline based events I have found that using Hbase is a great solution. Hbase ( https://hbase.apache.org/ ) is a column store built for random access to billions of rows X millions of columns. It has a single primary index in the form of the key you assign to the row, but has some really useful prefix filters and scanner operations. Its also fast enough to be a cache for hot data so most of your active records will be served through a LRU cache. One really useful feature for social network models is that it supports atomic counters which are very very fast (think 'like' counts). 2 examples:

Get all content from a user in chronological order:
```
 **Key Structure** = [UUID][JAVA_INT_MAX-TIME][CONTENTTYPE][CONTENT_UUID]
 **Scanner** = PrefixFilter([USERUUID][FIRST-N-Digits-from-timestamp])
```
Using this scanner would get your records back from a table containing hundreds of millions of entries in under 100ms. Ideally you would have the cells of the table reference content IDs from a separate content table that could essentially be a huge hashed lookup.
Get precomputed stats for a user (Followers / Posts / Likes ) in real time: In a separate table just use the UUID as key and a schema like:
```
     [ROW] :: [ColFamily:Followers]:[Col:[UUID's of followers]],
              [ColFamily:Following[UUIS's of following]],
             [ColFamily:Stats]:[Col:following] = AtomicCounter 
             [ColFamily:Stats]:[Col:posts] = AtomicCounter
```
- Analysis: Using Hbase also gives you Hadoop Map / Reduce abilities against the datasets, although check with your ops people on the best way to configure a cluster to serve online traffic while running map / reduce, sometimes you have to split your cluster into to 2 to support this, but thats a different post. Basically any time there is some deep question, or you forgot to add a counter to an event, you can use map / reduce to process all of the events and recalculate things. Another really useful use case for this is when you need to build a ranked feed. This is where I typically use elastic search.
  
  -Ranked Feeds: With social network models, there is often a lot more interesting data in the leading 24 hrs of all feeds. Using map / reduce to analyze all of your data it is possible to create detailed elastic search indexes of the most interesting content across all feeds so long as you optimize for recency or some large reduction in the size of the data to just what you need in searchable space. If done correctly this can also work for individual users ranked feeds.

Using this approach doesn't totally solve the super node problem, but it will get you far enough for most smaller networks.

Search Hashnode