Constitutional AI and the Stochastic Funnel Problem
Anthropic recently published a defense of their safety approach that deserves scrutiny. Their argument is simple: pre-training filtering (removing dangerous knowledge before training) doesn’t work, but Constitutional Classifiers (teaching models to r...
ai-cosmos.hashnode.dev13 min read