AI Can Leap Over Its Guardrails

October 20, 2025

Generative AI is built on a simple foundation: It predicts what word comes next. No matter how many layers of refinement developers add, they cannot morph word prediction into reason. Confidently presented misinformation is one result. Algorithmic gullibility is another. “Ex-Google CEO Sounds the Alarm: AI Can Learn to Kill,” reports eWeek. More specifically, it can be tricked into bypassing its guardrails against dangerous behavior. Eric Schmidt dropped that little tidbit at the recent Sifted Summit in London. Writer Liz Ticong observes:

“Schmidt’s remarks highlight the fragility of AI safeguards. Techniques such as prompt injections and jailbreaking enable attackers to manipulate AI models into bypassing safety filters or generating restricted content. In one early case, users created a ChatGPT alter ego called ‘DAN’ — short for Do Anything Now — that could answer banned questions after being threatened with deletion. The experiment showed how a few clever prompts can turn protective coding into a liability. Researchers say the same logic applies to newer models. Once the right sequence of inputs is identified, even the most secure AI systems can be tricked into simulating potentially hazardous behavior.”

For example, guardrails can block certain words or topics. But no matter how long those keyword lists get, someone will find a clever way to get around them. Substituting “unalive” for “kill” was an example. Layered prompts can also be used to evade constraints. Developers are in a constant struggle to plug such loopholes as soon as they are discovered. But even a quickly sealed breach can have dire consequences. The write-up notes:

“As AI systems grow more capable, they’re being tied into more tools, data, and decisions — and that makes any breach more costly. A single compromise could expose private information, generate realistic disinformation, or launch automated attacks faster than humans could respond. According to CNBC, Schmidt called it a potential ‘proliferation problem,’ the same dynamic that once defined nuclear technology, now applied to code that can rewrite itself.”

Fantastic. Are we sure the benefits of AI are worth the risk? Schmidt believes so, despite his warning. In fact, he calls AI “underhyped” (!) and predicts it will lead to more huge breakthroughs in science and industry. Also to substantial profits. Ah, there it is.

Cynthia Murrell, October 20, 2025

Written by Stephen E. Arnold · Filed Under AI, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.