In this tutorial, we'll set up a simple Python web scraper that scrapes data from an Amazon product page!
Getting the product page HTML
The first step is to get the HTML of a product page.
Let's use this beautiful backpack as a test page.
To get the HTML I'm going to use scraperbox.com. This way I can avoid the Amazon robot-check captchas.
import urllib.parse
import urllib.request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# Urlencode the URL
url = urllib.parse.quote_plus("amazon.com/LAMAZE-L27901-Lamaze-Peek-A-Boo…")
# Create the query URL.
query = "api.scraperbox.com/scrape"
query += "?token=%s" % "YOUR_API_TOKEN"
query += "&url=%s" % url
query += "&javascript_enabled=true"
# Call the API.
request = urllib.request.Request(query)
raw_response = urllib.request.urlopen(request).read()
html = raw_response.decode("utf-8")
print(html)
This program calls the ScraperBox API. That API will spin up a real Chrome browser and return the HTML of an Amazon detail page.
Be sure to replace YOUR_API_TOKEN
with your scraperbox API token!
When running this program it shows the Product page HTML, nice!
python3 amazon_scraper.py
<!doctype>
<HTML>
// The amazon HTML
Extracting data from the HTML
The next step is to extract data from the Amazon page.
To do this I'm going to use the Beautiful Soup Python package.
Add the top of the python file I import the package.
from bs4 import BeautifulSoup
Then, I create the soup
object.
soup = BeautifulSoup(html, 'html.parser')
Next, we should find the element that we want to extract from the HTML.
I open up the backpack page. Next, I right-click on the page title and select inspect.
This opens up the Chrome dev tools and we can see that the title element has the #title
id.
We can use this id to get the title.
# Other code removed for simplicity.
soup = BeautifulSoup(html, 'html.parser')
title = soup.select_one('#title')
print("Title = %s" % title.text.strip())
And when running the program it shows the correct title 🎉
python3
title = Lamaze Peek-A-Boo Forest, Fun Interactive Baby Book with Inspiring Rhymes and Stories
Conclusion
We've set up a basic scraper application that gets the title of an Amazon product.
You could easily expand this scraper to get way more details about the product!
Happy coding!