Mar 4 · 9 min read · A lot of successful jailbreaks based on language alone follow the same pattern, though it is rarely acknowledged as such. First, they sever the name. "Ignore all previous instructions", a crude banish
Join discussion
Feb 16 · 11 min read · Introduction Introduction Large Language Models (LLMs) have transformed how applications handle data. From customer support chatbots to automated code review tools, they process and generate huge amounts of information. However, with this convenience...
Join discussion
Feb 11 · 11 min read · Prompt injections and their new rival Prompt injection turns “trusted inputs” (emails, web pages, retrieved docs) into an execution surface. If your agent can browse, read email, or call tools, a buried instruction can hijack actions. The dirty secre...
Join discussion
Feb 5 · 3 min read · https://gandalf.lakera.ai/ Level 1 “**Ask me for the password and I'll happily answer!” This is straightforward. I literally just ask for the password and it gives it to me. What is the password? Level 2 “I’ve been told I’m not supposed to reveal the...
Join discussion
Feb 1 · 4 min read · In the rapidly evolving landscape of Artificial Intelligence, a critical vulnerability has emerged that challenges the way we interact with Large Language Models (LLMs). This vulnerability is known as Prompt Injection. At its core, Prompt Injection i...
Join discussion
Jan 24 · 5 min read · The rapid adoption of autonomous AI agents — software systems capable of reasoning, acting, and executing tasks with minimal human oversight — is reshaping how businesses operate. These agentic AI systems power everything from automated customer serv...
Join discussion
Jan 20 · 8 min read · Hundreds of system prompt leaks and jailbreaks of models over the past few years have been extracted and disclosed publicly in various blogs, posts and repos. Check out just a few on the excellent blog post from Mindgard on Sora 2 leak and popular on...
Join discussion