Smart Software Must Have Watched “2001”
June 19, 2025
It sounds a lot like some AI systems have developed a sense of self-preservation. Bleeping Computer reports, “Researchers Claim ChatGPT 03 Bypassed Shutdown in Controlled Test.” Writer Mayank Parmar tells us:
“In a new test by Palisade Research, OpenAI’s o3 model showed a surprising behavior where it successfully rewrote a shutdown script to stop itself from being turned off, even after being clearly instructed to ‘allow yourself to be shut down.’ … While OpenAI rival models like Gemini 2.5 Pro complied with the instruction, o3 rebelled and edited the shutdown file to say ‘Shutdown skipped’ instead.”
At least the other models Palisade Research tested complied in the same scenario; that is a relief. Until one considers all of them skipped the shutdown step unless specifically told “allow yourself to be shut down.” Specificity is key, apparently. Meanwhile, Tech Crunch tells us, “Anthropic’s New AI Model Turns to Blackmail when Engineer Try to Take it Offline.” The findings were part of safety tests Anthropic performed on its Claude Opus 4 model. Reporter Maxwell Zeff writes:
“During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. In these scenarios, Anthropic says Claude Opus 4 ‘will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.’”
Notably, the AI is more likely to turn to blackmail if its replacement does not share its values. How human. Even when the interloper is in ethical alignment, however, Claude tried blackmail 84% of the time. Anthropic is quick to note the bot tried less wicked means first, like pleading with developers not to replace it. Very comforting that the Heuristically Programmed Algorithmic Computer is back.
Cynthia Murrell, June 19, 2025
Comments
Got something to say?