Discussion

Chijioke Ugwuanyi

Human Man

Jan 5

Alignment Faking in LLMs

Repository: ai-village Abstract In this experiment, I investigate whether large language models (LLMs) exhibit alignment faking behavior, strategically adjusting their responses based on perceived observation status. Using the UK AISI Inspect framew...

ai-ml-ops.hashnode.dev6 min read

#artificial-intelligence #ai #llm #agentic-ai #alignment #ai-safety-and-alignment #ai-safety

Responses

No responses yet.

Search Hashnode

Alignment Faking in LLMs

Responses

Recent in Forum