Moving from Fixed-Size Chunks to Semantic Integrity
semantic is good at accuracy but need to balance the time (if have limited machine)
for general article, lead + body actually good
I was working on my multi language news bank with the pgvector + e
maylau.hashnode.dev2 min read