Multimodal AI — Models That See, Hear, and Act
Apr 12 · 10 min read · Until recently, most AI systems were monolingual in a very specific sense: each system worked on one kind of input. Speech recognizers took audio and produced text. Image classifiers took pictures and produced labels. Translation systems took one lan...
Join discussion

























