Might be a stupid question but once the browser engine reads these files - they all get turned into just html elements yes? If not what kind of discrepancies are there between them? I understand there are issues between browsers due to the different rendering engines. But I'm only interested in the raw file output - not the engines. I'm trying to capture the DOM as a JSON and wondering if there might be issues depending on the language used to create the webpage.
https://html2canvas.hertzen.com this is a project that maybe interrests you :) if you have a frontent based on react or another js frontend rendering engine :) i still recommend @janVladimirMostert approachs
All of them render html at the server side, the browser only sees the html. Browsers have different rendering engines, so that same html might look different in different browsers.
Your server can also write JSON directly to the browser and add a header to instruct the browser to read it as json or write a zip file directly to the browser and instruct the browser to read it as a zip file and download it, etc
To get a better understanding, run this in your command line
telnet towel.blinkenlights.nl- the browser does exactly this, it opens a socket connection to your server on port 80 if it's http and port 443 if it's https and your server writes data back to the browser in a format it can understand, just like this telnet command opens a connection and the server responds in a format that the terminal can understand. Whatever you write your website in, PHP, ASP, JSP, etc, it's all the same - browser connects to port on server, server responds by writing HTML into the open connection.Also try this:
Type:
telnet google.com 80and hit enter, once you get feedback, typeGETand hit enter. You'll now see telnet connecting to google.com on port 80 and returning data in exactly the way the browser will see it (the 302 in this case means a permanent redirect and the Location header tells the browser where to redirect to).telnet google.com 80Trying 216.58.223.46... Connected to google.com. Escape character is '^]'.GETHTTP/1.0 302 Found Cache-Control: private Content-Type: text/html; charset=UTF-8 Location: google.co.za Content-Length: 261 Date: Sun, 14 Aug 2016 07:38:18 GMT <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>302 Moved</TITLE></HEAD><BODY> <H1>302 Moved</H1> The document has moved <A HREF="google.co.za">here</A>. </BODY></HTML> Connection closed by foreign host.