Dhaval Singhwww.dsdev.in·Oct 22, 2024Experiments with gpt-4o vision and architecture diagramsI was playing around with 4o’s vision capability, especially for extracting complex technical architecture diagrams and here is how i did it. It’s a bit too early for conclusion on what works and what doesn’t. More on that in later posts. What do we ...4 likes·501 readsllm
The Next AI Toolthenextaitool.hashnode.dev·Oct 4, 2024Nvidia Enters Open-Source AI Arena with NVLMNVLM 1.0, a cutting-edge family of multimodal large language models (LLMs), is making waves in AI by setting new standards for vision-language tasks. Outperforming proprietary models like GPT-4o and open-access competitors such as Llama 3-V 405B, NVL...NVIDIA
Taha BouhsineforMLNomadsblog.mlnomads.com·Sep 16, 2024#AISprint Multimodal-verse: I - Intro to the Multimodal-VerseHey there, AI adventurer! Ready to step into the wild world of multimodality? Buckle up, because we're about to take your AI knowledge from "meh" to "mind-blowing"! First things first: What's this multimodal business all about? Picture this: You're s...2 likes·86 reads#multimodalai
Sourav KarmakarforAntEngage Blogblog.antengage.com·Sep 11, 2024Using Multi-Modal AI Agents to Transform Customer EngagementAs we step deeper into the age of Gen-AI powered everything, the way businesses engage with their customers must evolve alongside the technology that powers them. We’re no longer living in an age where a single conversation channel suffices for effec...27 readsArtificial Intelligence
Chia Yew Kenchiayewken.hashnode.dev·Aug 7, 2024The Puzzling Failure of Multimodal AI ChatbotsChatbot models such as GPT-4o and Gemini have demonstrated impressive capabilities in understanding both images and texts. However, it is not clear whether they can emulate the general intelligence and reasoning ability of humans. To investigate this...44 reads#multimodalai
Hrishikesh Yadavhrishikesh332.hashnode.dev·Jun 15, 2024Chapter 1 - Large Multimodal ModelThe Large Multimodal Model overview will be given to you in this blog. Also, the model's expanded capabilities for a variety of use cases, which were only achieved with a large amount of training data in deep learning, and the hands-on 💻 experience ...12 likes·228 readsllm
Nwankwo Obasiovpn.hashnode.dev·Dec 8, 2023Gemini is here!Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code. The AI war of 2023 is heating up, and Google has just thrown down the gauntlet. After Microsoft's GPT-4 captured the early zeitge...#multimodalai