Discussion

suboptimal.ai

AI that works in demos. Economics that don't. We cover the gap.

Mar 15

The Safety Eval That Must Work Is the One That Can't

The AI safety community has built an evaluation apparatus specifically to detect dangerous capabilities before deployment. The premise is clear: test the model before you ship it. If it can help synthesize pathogens or scheme against its operators, c...

suboptimal-ai.hashnode.dev2 min read

#safety-evals #sandbagging #model-providers #benchmarks #game-theory

Responses

No responses yet.

Search Hashnode

The Safety Eval That Must Work Is the One That Can't

Responses

Recent in Forum