Hi,
I am building a scraper based node server. How do I get the server to run a function that scrapes the website at a particular time of day, ideally morning, afternoon and evening,
The idea I have is it should use setTimeout, but I am hoping there is a standard way of doing this or library that does this?
For something like this, I think setTimeout is not the most usable solution especially if you have to crawl multiple websites. If I needed to run something like this, what I would do would be to use cron to trigger a node script three times a day(every 8 hours), that script gets the list of sites to crawl from the database and lets a queue service like beanstalkd deal with crawling.
@Madibalive curious if you are gonna open a repo on github on this? Be cool to see working example with cheeriojs, x-rayjs, nightmarejs, or phantomjs etc used along with one of these node packages mentioned by @Madibalive .
thanks for the answer , cron that word to search for am using the node schedule
Madiba Razak
Solving problems since 2015
Roopak A N
Full Stack Engineer
These two packages looks interesting. Considering their stars, followers and downloads, both of them looks promising and popular.
From their documentation, an important note is that the Node scripts should be running for these crons to kick in and start working.