Discussion on "LLM app dev using AWS Bedrock and Langchain"

Suyash Dubey · 2024-08-14T07:21:21.001Z

When trying to solve a Question Answering task over a larger document corpus with the help of LLMs we need to master the following challenges: How to manage large document(s) that exceed the token limit How to find the document(s) relevant to the q...

The character split strategy working better than semantic chunking for this dataset is a pattern I have also observed with PDFs that have dense table-of-contents structures — the semantic splitter often breaks across section boundaries rather than content boundaries. One thing worth adding: when your document corpus grows past a few hundred PDFs, a hybrid retriever that combines BM25 sparse retrieval with the Titan embeddings dense retrieval can meaningfully improve recall without tuning the chunk size.

Search Hashnode

LLM app dev using AWS Bedrock and Langchain

Responses(3)