Are all LLMs really 1.58 bits? Inference at 4x the speed or more?

·Apr 12, 2024

Apr 12, 2024

Out of curiosity: have you heard of vector symbolic architectures (also known as hyperdimensional computing)? They are not LLMs, but in some ways seem to have very similar underlying dynamics, and tend to use 1-bit or ternary representations. Though, I think their biggest draw is the easy-to-understand mathematical framework for how arbitrarily complex knowledge structures can be meaningfully built and manipulated in a high-dimensional vector space. In case you decide to give them a closer look, it would be interesting to know your thoughts on whether VSAs and LLMs might be doing something fundamentally similar. :-)

·3 replies

RJ Honicky

Author

·Apr 12, 2024

Author

·Apr 12, 2024

Sounds interesting, do you have a link to get me started?

Dima G

·Apr 13, 2024

Apr 13, 2024

RJ Honicky Sure! I first read about it in the Quanta magazine last year: quantamagazine org a-new-approach-to-computation-reimagines-artificial-intelligence-20230413

There is actually a website dedicated to the topic (hd-computing com), and the first paper mentioned in the "Course: Computing with High-Dimensional Vectors" section gives a thorough explanation of the basic principles.

I'm not so much a machine learning specialist/researcher as just someone who likes to read about interesting science, but this felt like the first intuitively understandable (even if incomplete) model of how/why the seemingly nebulous transformations of neural activations can result in meaningful manipulation of knowledge.

(Sorry about not providing direct links -- I'm a new user, so they're not allowed)

RJ Honicky

Author

·Apr 14, 2024

Author

·Apr 14, 2024

Dima G This is interesting. I don't understand the details, but it looks to me like they are using formal logic to constrain the vectors during training to create what looks to me like a more structured latent space, if I understand from my quick read of the article correctly.

To answer your questions more directly, I think they are actually using a neural network to train their "hyper-dimensional vectors," which are similar to vectors in the latent space in a neural network, including LLMs. In case you're not familiar, the latent space in a neural network is the high dimensional vector representation of knowledge that flows through the network. Although I say "high," this is actually very low compared to the number of dimensions in the space of possible word combinations, so is extremely compressed. In some sense, this is the job of the neural network: to compress information into this latent space, and then manipulate it.

Their idea seems to be, based on what I read, that they can control and interpret the latent space using formal logic, which seems very useful. It would come at the expense of less information per parameter (since the formal logic constrains the amount of entropy). This seems like a good tradeoff.

I'm not sure I understood the concept very well, but this is definitely an interesting thing to think about!

Thanks for the pointer!

Satoshi Takahashi

·Apr 18, 2024

Apr 18, 2024

Great article, the comparison to the post-quantization model should certainly be done more thoroughly.

·1 reply

RJ Honicky

Author

·Apr 18, 2024

Author

·Apr 18, 2024

Thank you, I hope the authors do that in fleshed out version of the paper!

Are All Large Language Models Really in 1.58 Bits?

6 comments