PDFs are a nightmare. Here's the normalization pipeline we wish we'd had from day one.
1d ago · 7 min read · If you've never built software that operates on PDFs at scale, here's the punchline: a "PDF" is closer to a programming language than to a file format. Two files can be byte-identical and contain diff
Join discussion

