My FeedDiscussionsHeadless CMS
New
Sign in
Log inSign up
Learn more about Hashnode Headless CMSHashnode Headless CMS
Collaborate seamlessly with Hashnode Headless CMS for Enterprise.
Upgrade ✨Learn more

If you build a web crawler how to check if the url crawled before or not ?

Ahmed Ashraf's photo
Ahmed Ashraf
·Dec 1, 2016

I'm building a web crawler for some of our customers .. they have a news website and they publish about 1000 new article per day .. so I think i could use redis to store the urls as set redis with schema like "domian:1"

redis.sadd("domain:1", url_string)

It works good enough for me but after one month from now it will be hard as i guess

so any better sloution for this .. any hints ?