In our previous experiment, we showed that persona-level behavioral rules (Soul Spec) barely help when an LLM's safety training has been surgically removed: +6pp refusal improvement on abliterated models versus +33pp on aligned ones. The conclusion f...
clawsouls.hashnode.dev6 min readNo responses yet.