Feb 10 · 3 min read · In my journey to dive deeper into multimodal AI systems, I decided to fine-tune BLIP-2, a powerful vision-language model trained on the Flickr8k dataset to generate image captions. What made this more exciting was integrating LoRA (Low-Rank Adaptatio...
Join discussion
Feb 3 · 21 min read · It's Been a While Sorry for being away for so long. I've been head-down strengthening my software engineering foundation—consuming rather than producing. After a while, I started feeling this pull to balance things out by building again. I just finis...
Join discussion
Oct 27, 2025 · 3 min read · 📝 Quick Summary: JoyCaption is an open-source Visual Language Model (VLM) designed for image captioning. It aims to provide a free, uncensored alternative to existing models like ChatGPT, enabling the training and fine-tuning of diffusion models on ...
Join discussionSep 29, 2025 · 2 min read · We’ve come a long way from traditional LLMs (Large Language Models) that could only understand textual data. But now, the AI world is rapidly evolving — and 𝗩𝗟𝗠𝘀 (𝗩𝗶𝘀𝗶𝗼𝗻-𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀) are leading the charge. 🔥 Unlike LLMs,...
Join discussionSep 25, 2025 · 7 min read · 🌍 The Problem: Brittle UI Test Automation Automated UI testing has long been a cornerstone of software quality. Frameworks like Selenium, Cypress, and Playwright have powered countless regression suites, CI/CD pipelines, and release processes. But d...
Join discussion
Feb 22, 2025 · 13 min read · In this blog post, we explore the intricacies of fine-tuning the Qwen2.5-7B-VL-Instruct model—a state-of-the-art multi-modal transformer designed for both text and image understanding. We will delve into the model’s architecture and its applications,...
Join discussion
Jan 11, 2025 · 6 min read · 🌟 Introduction Visual Language Models (VLMs) are revolutionizing the way machines understand and interact with the world. By combining the power of large language models (LLMs) with vision encoders, VLMs enable natural language interaction with visu...
Join discussion
Dec 8, 2024 · 7 min read · Introduction The evolution of vision-language models has been nothing short of remarkable. From their early stages of independently handling images and text to their current ability to seamlessly integrate the two, these models have reached new heigh...
UAkshi commented
Aug 2, 2024 · 6 min read · What is Vision Language Model ? Vision Language Models (VLM) are changing the game for multimodal content creation and interaction by bridging the gap between visual and textual cognition. Ongoing research in VLMs advances multimodel architecture to ...
Join discussion