SHSanskriti Harmukhinvultr.hashnode.dev·6d ago · 3 min readDeploying Paperless-ngx Open-Source Document Management System on Ubuntu 24.04Paperless-ngx is an open-source document management system that converts scans and PDFs into a fully searchable archive using Tesseract OCR, with tags, custom fields, and automated processing rules. T00
WEWeb Equipeinwebequipe.hashnode.dev·6d ago · 6 min readText-Based PDFs vs Scanned PDFs: Why Search Works Differently for EachPDF search sounds simple at first. You upload a PDF, extract the content, and make it searchable. But once you start working with real documents, you quickly realize that not every PDF behaves the sam00
FFoxindeepfox.hashnode.dev·Jun 19 · 6 min readThe hard part of national ID OCR isn't the OCRYou wire up OCR for your KYC flow, point it at a national ID card, and get back a clean { name, idNumber, dateOfBirth }. Ship it. Then you onboard your second country — and it falls apart. Fields you 00
MDmlai digitalinmlaidigital.hashnode.dev·Jun 18 · 10 min readUsing LLMs as OCR? Read This First | MLAI DigitalIntroduction: When “AI Can Read Anything” Goes Wrong The use case of AI document extraction is among the most popular and discussed use cases of modern AI systems. As large language models profess to 00
MSManikanta SSBinssbb7.hashnode.dev·Jun 9 · 7 min readI Spent Weeks Fighting OCR Before Realizing I Was Solving the Wrong ProblemHeads-up: The figure captions in this article are clickable. Click on any figure caption to view the associated outputs, visualizations, and intermediate results discussed in that section. Hey guys, i00
RKRitusmoi Kaushikinritusmoikaushik.hashnode.dev·May 27 · 8 min readParsing Indian GST Invoices With Regex, Not a Model /A GST invoice PDF looks structured to a human and is chaos to a parser. The total sits bottom-right on one vendor's layout and mid-page on another's. The seller GSTIN and the buyer GSTIN are the same 00
CVChirag Vijayinchirag4862.hashnode.dev·Apr 29 · 8 min readI Fine-Tuned YOLO to Understand Document Structure — Here's How It WorksThere's a class of problem in document AI that sounds deceptively simple: look at a page, figure out what's on it. Not read the text. Not classify the document. Just answer: where is the table? where 00
AEAing earmenginearmeng-aing.hashnode.dev·Apr 27 · 4 min readHow I leverage AI to reduce customer support inefficiencyCustomer support teams often receive the same screenshot-based issues over and over. “Why did my payment fail?”“Is this transaction successful?”“Why is my app showing this error?” In our support flow,00
SSSagar Sahuintexttopdfnet.hashnode.dev·Apr 18 · 4 min readHow to Extract Text from PDFs in JavaScript (Digital vs Scanned PDFs Explained)You try to extract text from a PDF file using JavaScript. Sometimes it works fine. Sometimes the output is empty or broken. This confuses many developers. The thing is that not all PDF files behave th00
SSSuneet Singh Puriinspunfromsun.hashnode.dev·Apr 9 · 5 min readThe Architecture of EmpathyWhen a loved one is fighting cancer, it takes a toll on all those involved, including the caregivers. Two years of medical records. Dozens of prescriptions. Countless lab results--the administrative b00