In my previous post I discussed how to create and run Selenium scraping jobs with Node.js. In this post I'll discuss how to run multiple concurrent jobs on a single server.
The Scraping Server
Let's talk about a single scraping server. I'm using a
m3.large Amazon instance type. That converts to 2 CPU's, 7.5 GB of RAM, and a SSD hard drive.
As you probably know, Chrome sucks your operating system's memory. I dedicated about 1GB of RAM for each Selenium Node so that Chrome has that amount of memory to launch and scrape. The ideal setup I found on this instance type was to have a total of six Selenium Nodes running one scraping job each. This allows for a total of six scraping jobs running concurrently.
Using Selenium grid, with a hub of six nodes was ideal for a
m3.large instance type on Amazon's platform. This allowed for six concurrent scraping jobs to be run at a time. Next up I discuss how I created a job queue in order to process tens of thousands of jobs.
Write your comment…
Be the first one to comment
The Dev Community
Free, friendly and inclusive platform for developers
Or connect with