Feb 10 ยท 3 min read ยท In my journey to dive deeper into multimodal AI systems, I decided to fine-tune BLIP-2, a powerful vision-language model trained on the Flickr8k dataset to generate image captions. What made this more exciting was integrating LoRA (Low-Rank Adaptatio...
Join discussion
Feb 3 ยท 21 min read ยท It's Been a While Sorry for being away for so long. I've been head-down strengthening my software engineering foundationโconsuming rather than producing. After a while, I started feeling this pull to balance things out by building again. I just finis...
Join discussion
Oct 27, 2025 ยท 3 min read ยท ๐ Quick Summary: JoyCaption is an open-source Visual Language Model (VLM) designed for image captioning. It aims to provide a free, uncensored alternative to existing models like ChatGPT, enabling the training and fine-tuning of diffusion models on ...
Join discussionSep 29, 2025 ยท 2 min read ยท Weโve come a long way from traditional LLMs (Large Language Models) that could only understand textual data. But now, the AI world is rapidly evolving โ and ๐ฉ๐๐ ๐ (๐ฉ๐ถ๐๐ถ๐ผ๐ป-๐๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐ ๐ผ๐ฑ๐ฒ๐น๐) are leading the charge. ๐ฅ Unlike LLMs,...
Join discussionSep 25, 2025 ยท 7 min read ยท ๐ The Problem: Brittle UI Test Automation Automated UI testing has long been a cornerstone of software quality. Frameworks like Selenium, Cypress, and Playwright have powered countless regression suites, CI/CD pipelines, and release processes. But d...
Join discussion
Feb 22, 2025 ยท 13 min read ยท In this blog post, we explore the intricacies of fine-tuning the Qwen2.5-7B-VL-Instruct modelโa state-of-the-art multi-modal transformer designed for both text and image understanding. We will delve into the modelโs architecture and its applications,...
Join discussion
Jan 11, 2025 ยท 6 min read ยท ๐ Introduction Visual Language Models (VLMs) are revolutionizing the way machines understand and interact with the world. By combining the power of large language models (LLMs) with vision encoders, VLMs enable natural language interaction with visu...
Join discussion
Dec 8, 2024 ยท 7 min read ยท Introduction The evolution of vision-language models has been nothing short of remarkable. From their early stages of independently handling images and text to their current ability to seamlessly integrate the two, these models have reached new heigh...
UAkshi commented
Aug 2, 2024 ยท 6 min read ยท What is Vision Language Model ? Vision Language Models (VLM) are changing the game for multimodal content creation and interaction by bridging the gap between visual and textual cognition. Ongoing research in VLMs advances multimodel architecture to ...
Join discussion