guillaume.nycWhat is CLIP and why does it matter?Paper: Learning Transferable Visual Models From Natural Language Supervision, Radford et al. (2021) Introduction As we discussed in previous posts, Contrastive Learning isn't new. For example, FAIR's Moco was published in 2019, along with many other ...May 12, 2025·3 min read
guillaume.nycHow did LLMs gain Vision?Introduction LLMs were created as text-only chat bots as an artifact of their training paradigm: they learn to predict the next token (~word). Photos, videos and other richer media don’t have words and therefore were naturally excluded from the train...May 5, 2025·3 min read
guillaume.nycThe state of Self-Supervision for VisionIntroduction To perform vision tasks effectively, it's important to have a strong, general-purpose vision backbone. This allows you to handle many vision tasks, such as: Image-to-image similarity: For comparison or retrieval. Vision adapters: Add a...May 1, 2025·7 min read
guillaume.nycDon't use raw embeddingsIntroduction: With the rise of Transformers, embeddings are now widely used: As representations of images or texts that can be used by other models or in a zero-shot manner As a basic building block for Vector Search in LLM RAG and image search H...Apr 15, 2025·3 min read
guillaume.nycFocus: Shapley valueGame theory is a fascinating topic codifying and quantifying all sorts of interactions between stakeholders in a game. The most popular setup is the prisoner's dilemma but there is much more to it. Today, we will cover the Shapley value as I recently...Apr 15, 2025·2 min read