Great breakdown. PDF RAG is easy to demo, but hard to make trustworthy.
Chunking, metadata, citations, access control, and index hygiene matter a lot once the PDFs are internal or customer-facing.
At LangProtect, we look at this as both a retrieval problem and a data-security problem: what gets indexed, what gets retrieved, and what sensitive context reaches the user.