Fine-Tune Llama 3.2 Vision-Language Model on Custom Datasets
Llama 3.2, a powerful multimodal large language model (LLM) from Meta AI, has recently been released, pushing the boundaries of AI capabilities by enabling machines to understand both visual and textual information. While this pre-trained model is im...
blog.futuresmart.ai11 min read
lin lin
Thaks for your excellent blog! I have some questions that can I make this fintuned model based on the unscloth frame to get the embeddings?That means can I just input a image, and I can get the image-text embedding from finetune llama?