Python has requests for getting that from remote servers, and beautifulsoup for converting the data to Python objects (if it's html).
That'd be a starting point. Request one page from a server. If that works, take the response text and turn it into Python objects. Look up the data you want inside. Then repeat for multiple pages.
Eventually you may want to make it faster by doing multiple requests at the same time. You can use multiprocessing.Pool. But get it working first, being optimizing.