@leebyron
Engineer @ Facebook
I've worked at Facebook as a Data Scientist, Product Designer, Engineering Manager, and most recently an Engineer. I've built lots of things related to mobile, across web and apps. Most recently I work on tools and libraries to help product developers built better things faster, including React, GraphQL and Immutable.js
Nothing here yet.
No blogs yet.
Here are a few questions you might answer to guide yourself through building a Facebook-like feed: How many followers will the most popular content sources have? Twitter used to call this the "Bieber problem": when he tweeted, a fan-out operation occurred, placing this new tweet into the feeds of every one of his followers which was a massive job that locked up whole machines. In Twitter's earlier years, a couple quick @justinbieber tweets in a row could "fail whale" the whole service. The "Bieber problem" is an illustration of a critical decision of the architecture of a feed service: do you build a user's feed at write time or at read time? Twitter's architecture placed tweets into feeds as soon as they were written into the service. By contrast, Facebook's architecture waits for a user to read their feed before actually looking up the latest posts from each followed account. In this example, Bieber is a "source" of stories. What are your sources and how popular will they be? Will you have broad topics where one story may be distributed to many people? If you choose a write-time architecture and have one million users following a popular source, how long will it take for the one-millionth user to finally receive the post into their feed? Especially as you start to design a sharding strategy, this is a very important question. Chronological feed or Ranked feed? Should the story at the very top of your activity feed be the very most latest thing that happened, or should it be the very most interesting thing that happened? This is more of a product decision than an architectural one, but it has serious implications on your feed architecture. Twitter and Instagram are good examples of chronological feeds (although Twitter has recently introduced partially ranked feeds). The "write time" post distribution makes it difficult to do anything other than a chronological feed. Facebook is of course a Ranked feed. The way it works is when you load your Newsfeed, all of the stories posted since the last time you looked are given an "interestingness" value (how long ago it was posted is a factor) and that becomes the sort for those stories. You can think of this kind of feed like layers of sedimentary rock. As soon as a new collection of stories are loaded, they're frozen in place so the feed looks familiar, each new viewing brings more stories atop the last set. A few rare viewings will appear largely ranked and not chronological, many frequent viewings will appear more ranked and less chronological. Can stories change position in the feed? When a tweet is retweeted, you may see it in your feed at the time it was re-tweeted. If multiple people retweet it, it may appear where the first follower retweeted it but not appear a second time. If a story on Facebook gets a particularly relevant comment, it may resurface. Consider if your stories may change position in feed for any reason and consider how your feed architecture could enable this. Also think about what would happen to a client (mobile app, website) that was already showing the same story lower in the feed. You probably want to avoid showing duplicates! Can stories be aggregated together? If you have stories from subscribed topics, will you ever have an aggregation story like "5 new articles were posted to 'Politics'."? If multiple people "share" or "retweet" a story, do you show all of those actions together? If stories are replies to one-another (twitter @mentions, a good example) do you show them in the feed in conversational order, or leave them in upside-down chronological order? This may also change the position of an existing story in feed if a later event causes it to be "aggregated" like this. Can stories be later revoked from a feed? Can the author of the post remove it? Perhaps the author of the post blocks the viewer or vice-versa? Or the viewer un-follows the source after it was already put into a feed? Consider how these kinds of actions would impact a cached feed. Hopefully those questions are useful in helping get more concrete about the kinds of problems you might have to solve in building a feed. I definitely suggest being able to access a cached version, or at least partially-cached list of stories to improve speed. One of the benefits of a "write-time" feed is that you can build a very fast access cached feed. One of Instagram's best early features was its high performance. Despite being loaded on crappy iPhone 3GS on slow 3G networks, the Instagram feed always loaded lightning fast. There were zero SQL queries involved with loading a feed. Facebook's ranked feed requires more work to cache, but because each previously loaded set of stories is saved, it can be very fast to load a second time. Part of the secret of producing a ranked feed quickly is parallelization, and prioritizing getting the first couple stories ready first so a user can start reading while the server is preparing the rest. Facebook has built custom infrastructure for feed ranking at the speed it does for as many users as it has, but the rough idea is to create something like a "max-heap" data structure so the stories most likely to be interesting are quickly available. Also remember YAGNI . These are all great things to think about to figure out what problems you'll need to solve first, but if you're building something brand new, you're better off going for whatever will be simple and fast and tackling these problems as they become relevant for your product.
@vivalapanda sure - caching the results of REST endpoints typically means just using the browser's existing cache (or a rough equivalent if not using a browser). Since GraphQL composes all data needed into a single response, simply caching that network response is not a very flexible cache strategy. There are a wide range of cache strategies that you could take (same goes for REST), and the Relay store represents what a very sophisticated GraphQL client cache looks like. Relay: https://facebook.github.io/relay/
The most practical use case for GraphQL, and the one it was originally designed to support, is a client (iOS or Android app, or Web app) requesting data from a server. Choosing GraphQL or REST (or other data services like Thrift) depends on the constraints of your application and services. REST is often the right choice when you need your data access to align tightly to HTTP. For example, if you rely heavily on HTTP-level caching, or if you need to represent data spread across many different domain services which use URLs to reference other data. GraphQL is often the right choice when network speed and latency is the primary concern and data isn't spread across different domains but is instead centralized to one product. In this case a single network request can load many different kinds of resources at once, and selectively include only what that client needs. Particularly on mobile network connections, this can be a dramatic performance improvement. However caching data may require more sophisticated techniques. Thrift, or something like it, is often the right choice when network speed is not a concern, but memory and CPU pressure on the server is, and caching is inappropriate. This is often a good choice for service to service communication, which often occurs within or between data-centers.
GraphQL itself does not perform access control for you, for the same reasons that GraphQL does not fetch your data for you - it leaves this up to you for maximum flexibility. Most GraphQL servers have a concept of "context" which lets you provide an extra value to every field resolver during a query, and this is designed for exactly this case: the context is a perfect place to provide the "logged-in user" or "ACL roles" or whatever you like to call your authentication data. Then in your field resolvers, you have access to this authentication data to decide if it's better to return null if the authed user shouldn't be able to access the value.
What clientside routing system do you use inside the components? It seems there isn't one compatible with React and ReactNative. How do reuse functionalities between the two? I highly recommend https://github.com/reactjs/react-router for building things on the web, where things are URL based. ReactNative has it's own router that's designed to work very well with the OS native navigations. We're often not reusing functionality between the two because they're so different, and each platform deserves to feel native. Instead we're reusing the React components themselves which make up the app and UI logic of various screens. What do you think of the "React Native Web" approach and consider the Web a rn platform? This is a really interesting idea for future exploration and a lot of people on the team are experimenting with it. One serious missing piece is how layout is performed, on the web we don't have APIs to measure how text will layout, where we do have those things on iOS and Android native views. Starting to fill in those blanks over time by working closely with browser vendors and standards committees will help us explore this.
Relay is an implementation of the Flux concepts. Relay requires you to have a GraphQL service. Our iOS and Android apps are almost entirely powered by GraphQL (but not Relay, as they're not JavaScript). Parts of those apps, and in some cases whole apps (Ads Manager) are built with React Native and use Relay with GraphQL.