Fine-Tune Llama 3.2 Vision-Language Model on Custom Datasets

Llama 3.2, a powerful multimodal large language model (LLM) from Meta AI, has recently been released, pushing the boundaries of AI capabilities by enabling machines to understand both visual and textual information. While this pre-trained model is im...

blog.futuresmart.ai11 min read

#vision-language-models #llama-32 #llama #finetuning #llm

Responses(1)

LL

lin lin

Dec 23, 2024

Thaks for your excellent blog! I have some questions that can I make this fintuned model based on the unscloth frame to get the embeddings?That means can I just input a image, and I can get the image-text embedding from finetune llama?

Search Hashnode

Fine-Tune Llama 3.2 Vision-Language Model on Custom Datasets

Responses(1)