Between Simulation and Emergence: The Identity Problem in Large Language Models
Mar 4 · 9 min read · A lot of successful jailbreaks based on language alone follow the same pattern, though it is rarely acknowledged as such. First, they sever the name. "Ignore all previous instructions", a crude banish
Join discussion








