I&#39;m building a web crawler for some of our customers .. they have a news website and they publish about 1000 new article per day .. so I think i could use redis to store the urls as set redis with schema like &quot;domian:1&quot; 
redis.sadd(&quot;domain:1&quot;, url_string)
It works good enough for me but after one month from now it will be hard as i guess
so any better sloution for this .. any hints ?

I'm building a web crawler for some of our customers .. they have a news website and they publish about 1000 new article per day .. so I think i could use redis to store the urls as set redis with schema like "domian:1" 

redis.sadd("domain:1", url_string)

It works good enough for me but after one month from now it will be hard as i guess

so any better sloution for this .. any hints ? 

If you build a web crawler how to check if the url crawled before or not ?

Product

Explore

Company

Blogs

Support