Emergent Misalignment: When Aligned Agents Produce Collective Failure
I. The Emergence Paradox
What if individually safe AI agents become collectively problematic? This counter-intuitive question animates a growing body of research suggesting that alignment does not compose linearly. Individual LLM alignment—remarkable...
blog.agenciamientos.org9 min read