AI Can Leap Over Its Guardrails
October 20, 2025
Generative AI is built on a simple foundation: It predicts what word comes next. No matter how many layers of refinement developers add, they cannot morph word prediction into reason. Confidently presented misinformation is one result. Algorithmic gullibility is another. “Ex-Google CEO Sounds the Alarm: AI Can Learn to Kill,” reports eWeek. More specifically, it can be tricked into bypassing its guardrails against dangerous behavior. Eric Schmidt dropped that little tidbit at the recent Sifted Summit in London. Writer Liz Ticong observes:
“Schmidt’s remarks highlight the fragility of AI safeguards. Techniques such as prompt injections and jailbreaking enable attackers to manipulate AI models into bypassing safety filters or generating restricted content. In one early case, users created a ChatGPT alter ego called ‘DAN’ — short for Do Anything Now — that could answer banned questions after being threatened with deletion. The experiment showed how a few clever prompts can turn protective coding into a liability. Researchers say the same logic applies to newer models. Once the right sequence of inputs is identified, even the most secure AI systems can be tricked into simulating potentially hazardous behavior.”
For example, guardrails can block certain words or topics. But no matter how long those keyword lists get, someone will find a clever way to get around them. Substituting “unalive” for “kill” was an example. Layered prompts can also be used to evade constraints. Developers are in a constant struggle to plug such loopholes as soon as they are discovered. But even a quickly sealed breach can have dire consequences. The write-up notes:
“As AI systems grow more capable, they’re being tied into more tools, data, and decisions — and that makes any breach more costly. A single compromise could expose private information, generate realistic disinformation, or launch automated attacks faster than humans could respond. According to CNBC, Schmidt called it a potential ‘proliferation problem,’ the same dynamic that once defined nuclear technology, now applied to code that can rewrite itself.”
Fantastic. Are we sure the benefits of AI are worth the risk? Schmidt believes so, despite his warning. In fact, he calls AI “underhyped” (!) and predicts it will lead to more huge breakthroughs in science and industry. Also to substantial profits. Ah, there it is.
Cynthia Murrell, October 20, 2025
Comments
Got something to say?

