Exploring and Finetuning Sa2VA (Segment Anything 2 + Vision Assistant) by Bytedance
Feb 26 · 8 min read · So existing MLLMs are pretty good at one thing. Either they do vision-language chat (LLaVA, InternVL, the usual suspects) or they do segmentation (SAM2, SEEM). Combining them usually means running sep
Join discussion

