I want to make an aggregator for flight ticket prices in realtime from multiple websites by scraping them. So, I would need parallel processing for this task for which, Golang is very good. I know Node.js and python, but, I don't know Golang. I want to know is learning Golang and implementing it a better choice? Or, can you suggest good libraries based on NodeJS or Python which can do the work ? In short, I want to get a review of the mentioned languages when it comes to implementing aggregators.
Also, it would be great if you can suggest me better alternative to scraping the websites.
EDIT: added detail that I want to aggregrate flight ticket prices in realtime
Sandeep Panda
co-founder, Hashnode
I think Node.js combined with MongoDB will be a great choice. You can use the aggregation framework of MongoDB and use a Node module like async to process things in parallel. I have not used Python/Go to do aggregation, so someone else can give an alternative approach involving these languages. Also if you are using Mongoose library in Node.js here is a tutorial that teaches aggregation using Node.js/Mongoose combination.
Thanks for inviting Amulya.
I have never used Go or Python, so I can't comment on how Node.js stacks up against the other two. But you can certainly use Node.js to achieve this.
Firstly, you need to decide if you really want to scrape other websites or just want to use some kind of API? The problem with scraping is that the layout of the webpages can change any time and they may block your IP if you make excessive requests. So, I would say if you are doing something for production usage you should go with an API based approach. Otherwise, if this is just for fun or internal usage I guess scraping won't be a problem. By the way Cleartrip already has an API for checking flight prices. You may utilise that.
Anyway you need to do a lot of processing. As Jose suggested async module is a neat tool to parallelize various tasks. If you have a set of predefined websites for crawling, you can process each one parallelly and once you are done you can collect the results and send the data to client (see
async.parallel()). If you decide to use an API it will be pretty straightforward IMO.Hope this helps. Let me know if you have any questions.