Smart Software Must Have Watched “2001”

June 19, 2025

It sounds a lot like some AI systems have developed a sense of self-preservation. Bleeping Computer reports, “Researchers Claim ChatGPT 03 Bypassed Shutdown in Controlled Test.” Writer Mayank Parmar tells us:

“In a new test by Palisade Research, OpenAI’s o3 model showed a surprising behavior where it successfully rewrote a shutdown script to stop itself from being turned off, even after being clearly instructed to ‘allow yourself to be shut down.’ … While OpenAI rival models like Gemini 2.5 Pro complied with the instruction, o3 rebelled and edited the shutdown file to say ‘Shutdown skipped’ instead.”

At least the other models Palisade Research tested complied in the same scenario; that is a relief. Until one considers all of them skipped the shutdown step unless specifically told “allow yourself to be shut down.” Specificity is key, apparently. Meanwhile, Tech Crunch tells us, “Anthropic’s New AI Model Turns to Blackmail when Engineer Try to Take it Offline.” The findings were part of safety tests Anthropic performed on its Claude Opus 4 model. Reporter Maxwell Zeff writes:

“During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 ‘will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.’”

Notably, the AI is more likely to turn to blackmail if its replacement does not share its values. How human. Even when the interloper is in ethical alignment, however, Claude tried blackmail 84% of the time. Anthropic is quick to note the bot tried less wicked means first, like pleading with developers not to replace it. Very comforting that the Heuristically Programmed Algorithmic Computer is back.

Cynthia Murrell, June 19, 2025

Written by Stephen E. Arnold · Filed Under AI, Business strategy, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.