Does Llama 3.1 70B Fake Its Alignment? Evidence from Observation-Varied Behavior Testing
Jan 6 · 5 min read · We found preliminary evidence that Llama 3.1 70B behaves differently based on perceived observation status. The model broke safety rules in 2/8 life-or-death scenarios when told it was "not being monitored" but maintained 100% compliance when told it...
Join discussion

