Architecture : How would you go about building an activity feed like Facebook?

View original thread
Mario Giambanco's photo

If I were to build this, I'd probably consider SQL over a JSON store, but it can be done in either. The problem isn't the storage mechanism or the language IMO, the problem is the structure.

First - understand a post is a post and user / page / company / group is a content creator - all 4 can be treated the same, the cavat being presenting a different layout for each type of content creator.

That said, I'll call a content creator a user for the sake of keeping the code short and clean.

user (content creator)
-- (unique) id
-- name
-- email
-- etc...

user_following
-- (unique) id
-- user_id

user_content
-- (unique) id
-- user_id
-- post_title
-- post_body
-- post_created_date
-- etc...

user_feed
-- (unique) id
-- creator_id
-- content_id

Now - content creator creates a post - this goes into user_content

Visitor wants to follow a content creator and get their posts in their feed - the follow button on the content creators "page" inserts an entry into the user_following table with the user_id of the visitor

When the content creator creates new content, it loops through it's user_following table and inserts a reference to the user_content entry into the visitors user_feed table - so roughly, for every follower a content creator has, upon a new post being created, it'll also create a reference for each of those followers in their user_feed table

When the visitor visits the site, it queries their user_feed table, pulls the references to the content_id and the creator_id and queries for the actual content. In SQL, this could be done with a join; in JSON you'll need to store all the data (and update, delete it) in the user_feed table

Additional tables will be needed for comments, likes, analytics, etc... but that should be the jist of it.

Past that, how the system actually functions comes into play. If a content creator had 100 posts and I suddenly unfollow them - do their past posts remain in my feed if I scroll back far enough and the change is only going forward or will past posts be removed from my feed? A simple query in SQL but in JSON, you'll need to store a lot more references to remove the old posts from my feed.

As for caching - a combination of SQL and JSON could be used - this is a whole different conversation IMO - SQL should be able to scale this nicely - with JSON, you may end up with a very big table if you have enough users / content. Caching of the actual code / html / pages shouldn't be an issue and any language should be able to accommodate. You could keep a "recent" activity feed in something like Firebase allowing the user to see real time, new posts, in their feed and past say, 20 or 40 posts, query the DB for the remainder. Again, it's a matter of how the system fill function after the initial design functionality will be built.