I am worried to know how getpocket gets the content from all website (even ones that has strict rules like new york times) and if they are using scraping, what's the logic behind because when you save something than you open they manage to get title and body separately from all website ?
P.S. If the solution is in Nodejs, it would be more helpful
Thanks
Ryosuke
Designer / Developer / Influencer
Not sure how Pocket handles it because they use a Chrome extension, but sites like Tumblr have "bookmarklets" that do this.
You basically have the user save a bookmark that contains Javascript, and that JS scrapes the page it's running on for the necessary content (headers, content, etc).
Here's the Tumblr bookmarklet for reference:
javascript:var d=document,w=window,e=w.getSelection,k=d.getSelection,x=d.selection,s=(e?e():(k)?k():(x?x.createRange().text:0)),f='tumblr.com/widgets/share/tool',l=d.location,e=encodeURIComponent,p='?url='+e(l.href) +'&title='+e(d.title) +'&selection='+e(s) +'&shareSource=bookmarklet',u=f+p,sw=0,sd;try{sd =d.createElement('div');sd.style.height ='100px';sd.style.width='100px';sd.style.overflow='scroll';d.body.appendChild(sd);sw=sd.offsetWidth-sd.clientWidth;d.body.removeChild(sd);}catch(z){};try{if(!/^(..)?tumblr[^.]$/.test(l.host))throw(0);tstbklt();}catch(z){a =function(){if(!w.open(u,'_blank','toolbar=0,resizable=0,status=1,scrollbars=1,width='+(540+sw)+',height=600'))l.href=u;};setTimeout(a,10);}void(0)They grab the URL and title cleverly, and grab the user's text selection with a separate function, and create a popup form that auto-fills with the info.
Make sure you encode the script before adding as a bookmark (such as "=" would be "%3D").
I'm sure this code can translate to a Chrome extension, or easily Node, since it's JS based.