How to Stop PDF Parsers from Hallucinating Tables out of Thin Air
PDF extraction is usually blind.
If you've ever tried to write a script to scrape a PDF, you know exactly what I mean. You run the PDF through a generic text extractor, and instead of a clean table, y
ginexys.hashnode.dev5 min read