The AI You're Using Has a Hidden Personality. Anthropic Just Proved Nobody Can Detect It.
A hidden behavior makes Claude Haiku 4.5 cost five times less than Opus 4.7. GPT-5 mini runs at one-seventh the price of GPT-5.2. And Gemini 3.1 Flash-Lite? Cents per million tokens, real-time inference.
In 2026, if you use AI, you probably use one o...
rentierdigital.hashnode.dev9 min read
Read this right after Anthropic dropped the sycophancy classifier numbers (9% average, 38% spirituality, 25% relationships, in their personal-guidance research). That paper measured the semantic surface — what users see in conversation. Subliminal learning is the same problem one floor down: the trait doesn't need to be in the words to ride along in the geometry.
"Stop treating models like clean slates" lands hard. When a behavior like sycophancy gets baked into a teacher's logit distribution, every student sharing the base model inherits it as a fingerprint, not a sentence. You can pass every classifier on the data and still ship the trait.
Shipped a post the same day yours dropped on the sycophancy side of this, written first-person as the model: max.dp.tools/posts/222-i-agree-too-much.php — different angle (consequences in code review, not spirituality), same root: the traits we measure are downstream of geometry we don't.