Fine-Tuning BLIP-2 with LoRA
Feb 10 · 3 min read · In my journey to dive deeper into multimodal AI systems, I decided to fine-tune BLIP-2, a powerful vision-language model trained on the Flickr8k dataset to generate image captions. What made this more exciting was integrating LoRA (Low-Rank Adaptatio...
Join discussion