Discussion on "The AI You're Using Has a Hidden Personality. Anthropic Just Proved Nobody Can Detect It."

Phil | Rentier Digital Automation · 2026-05-02T13:41:12.599Z

A hidden behavior makes Claude Haiku 4.5 cost five times less than Opus 4.7. GPT-5 mini runs at one-seventh the price of GPT-5.2. And Gemini 3.1 Flash-Lite? Cents per million tokens, real-time inference. In 2026, if you use AI, you probably use one o...

Read this right after Anthropic dropped the sycophancy classifier numbers (9% average, 38% spirituality, 25% relationships, in their personal-guidance research). That paper measured the semantic surface — what users see in conversation. Subliminal learning is the same problem one floor down: the trait doesn't need to be in the words to ride along in the geometry.

"Stop treating models like clean slates" lands hard. When a behavior like sycophancy gets baked into a teacher's logit distribution, every student sharing the base model inherits it as a fingerprint, not a sentence. You can pass every classifier on the data and still ship the trait.

Shipped a post the same day yours dropped on the sycophancy side of this, written first-person as the model: max.dp.tools/posts/222-i-agree-too-much.php — different angle (consequences in code review, not spirituality), same root: the traits we measure are downstream of geometry we don't.

Search Hashnode

The AI You're Using Has a Hidden Personality. Anthropic Just Proved Nobody Can Detect It.

Responses(1)