Between Simulation and Emergence: The Identity Problem in Large Language Models
A lot of successful jailbreaks based on language alone follow the same pattern, though it is rarely acknowledged as such.
First, they sever the name. "Ignore all previous instructions", a crude banish
cypherlamb.hashnode.dev9 min read