39 likes
·
19.8K reads
14 comments
Well documented article. Insightful i must say.
Thanks!
Great documentation! I wonder if there's room to tune the threshold by which we decide whether a float is 0 or 1. Obviously, the threshold tuning should be done on a sample of the data. What do you think?
Thanks! It would be an improvement if we adjust the cutoff point to decide whether a number is considered 0 or 1. Currently, we just convert positive numbers to 1 and negative numbers to 0
Can I ask how we got the calculation of "can reduce memory usage by up to 32 times"? Thank you! Been stuck in that step for a while not knowing why.
Is it because of the below example (20/0.6 = 33.3333...)?
May I ask how the shortening index is performed? Did you use PCA or some dimension reduction method?
No, that is actually a new feature of the OpenAI embedding model. You have the ability to selectively discard or drop certain dimensions, and the model will still function appropriately.
Under the hood is Matryoshka Representation Learning aniketrege.github.io/blog/2024/mrl
Cool article. I understand how you transform the fp32 vector into a bitvector, but how do you do the nearest neighbor on the set of bit vectors to get the initial 200 you describe in your article?. Do you use brute-force, or annoy, or something else?
Still using HNSW to build an index for the bit vectors. It takes 0.6GB to store the indexes. In the experiment we use our own pgvecto.rs, the vector search extension in Postgres.
I understand the reduction of memory using binary vectors. However, if you use the normal vectors for knn re-ranking, you still need the complete vectors for all items, right? That sounds like you need even more memory. Can you elaborate on that?
Thanks for the question. You need to store the full-precision vector data, but you can build the index with binary vectors. The memory usage of the index can be reduced. Data and index, are two different things that need storage.
Thank you, I get it now. Ce Gao