TStarini sunilinai-content-utilties.hashnode.dev00What Your Documents Whisper When Nobody's Looking4d ago · 5 min read · Most people think important information is easy to spot. Look at enough documents and you will find the important topics because they appear again and again. More mentions = more importance. Right? NoJoin discussion
TStarini sunilinai-content-utilties.hashnode.dev00Why Most RAG Pipelines Destroy Document StructureMay 18 · 4 min read · Most developers building RAG systems make the same mistake. They split documents by token count, call it "chunking," and move on.The result: a retrieval system that finds the right words in the wrong Join discussion
TStarini sunilinai-content-utilties.hashnode.dev00Why Document Parsing Is Harder Than It LooksMay 8 · 4 min read · Most document parsers flatten everything into plain text. But real-world documents are messy: inconsistent headings broken bullet lists repeated sections tables missing structure I wanted to seJoin discussion
TStarini sunilinai-content-utilties.hashnode.dev00From One Shot to a Pipeline: Evolving DOCX → JSON (V1 → V2)Apr 27 · 7 min read · Why change what works? A common first version of “turn this Word file into JSON” is simple: read the text, send all of it to the LLM once, parse JSON back. It ships fast and works on small docs. In prJoin discussion
TStarini sunilinai-content-utilties.hashnode.dev00From Word to JSON: A First-Pass DOCX Pipeline with an LLM Apr 23 · 5 min read · v1 experiment: extract text from Word, ask a model to structure it, then validate the result. Nothing fancy yet — and that is the point. Why this exists Word documents are easy for people to write andJoin discussion