What is the best and proper tech stack to build a good recommendation engine?

I want to use MongoDB, but I can see that it's not a great fit for graph data. So, what is a good DB/stack to build a well performant recommendation engine which can be used in social network kind of apps? Some people suggest Redis + Neo4j but I am not sure if that's the best solution. Some also suggest Cassandra but again I have never used that in any previous apps. What do you guys think?


Comments (3)

Jan Vladimir Mostert's photo

At what scale are you building the recommendation engine, what kind of data are storing and do you need it to be real-time or offline recommendations?

Neo4j is certain a good option for doing real-time recommendations or Cassandra if you plan on doing your own map-reduce algorithms, but you can build a recommendation engine using MySQL as well, or not using any database at all or use a Recommendation Engine As A Service type of service like Google and Amazon are offering.

It all depends on what your requirements are.

Jan Vladimir Mostert's photo

For social networks, Neo4j is a natural fit for storing graph data, Redis for caching and MongoDB for linear data like logs, feeds, messages, etc.

Use something like SOAP UI's REST feature to build load test and send the load tests to the cloud to see how much load you can handle, then you'll know if your architecture scales or not, more often than not you'll find problems at high loads which you didn't anticipate - like running out of connections, bottlenecks you didn't think about, running out of disk space due to logs, memory issues since you forgot to enable some random flag, etc

For that kind of load, you can even use SQL, guess what StackOverflow is running on: blog.stackoverflow.com/2008/09/what-was-sta.. + Redis for caching: highscalability.com/blog/2011/3/3/stack-ove..

Norma King's photo

Thanks for commenting. I would like it to be semi-realtime. As this is for an internal social network, users should be able to view new feed items every time they refresh. Feed shouldn't necessarily be realtime. I am aiming to serve around 10k users once we start. So, I was wondering what should be a good solution here. What do you think about MongoDB + Redis implementation?