Thread

Basant Sharma

May 24, 2017

Web Scraping from nse.com

I am trying to scrape data from nseindia.com...I am interested in the tabular data. I got the Xpath of the element using insepct element of Chrome. But what I get is this:

[<Element div at 0x107e6b730>]

What is the problem here?

from lxml import html
import requests

page = requests.get('nseindia.com/live_market/dynaContent/live_analysi…')

tree = html.fromstring(page.content)

buyers = tree.xpath('//*[@id="tab7Content"]/div[2]')
print buyers

#html #python #xpath #lxml

Responses(1)

Mev-Rael

Executive Product Leader & Mentor for High-End Influencers and Brands @ mevrael.com

Use NodeJS for such tasks in 2017. When you can work with the document and extract any data from it same way as in browser console using document.getElementById() or whatever is amazing. You also would get errors and warnings you can understand same as in browser. Well, since it is JavaScript, apart from JavaScript FTW error messages.

Then you can run that node script from python or any other language. Or you could instantly run a separate microservice which would add data to DB, redis, whatever itself.

P.S. I suppose your message means that you are trying to print an object which is true, xpath returned div you wanted, it's not a text or content you need, so go and extract it with buyer.textContent or whatever you have in python. I still would prefer NodeJS with DOM spec.

Search Hashnode

Web Scraping from nse.com

Responses(1)

Recent in Forum