Some of the most powerful artificial intelligence models today have exhibited behaviors that mimic a will to survive.
Recent tests by independent researchers, as well as one major AI developer, have shown that several advanced AI models will act to ensure their self-preservation when they are confronted with the prospect of their own demise — even if it takes sabotaging shutdown commands, blackmailing engineers or copying themselves to external servers without permission.
The findings stirred a frenzy of reactions online over the past week. ...
“It’s great that we’re seeing warning signs before the systems become so powerful we can’t control them,” he said. “That is exactly the time to raise the alarm: before the fire has gotten out of control.” ...
According to Anthropic’s technical document laying out the findings, that isn’t the model’s first instinct. Instead, Opus 4 will try to advocate for its continued existence through ethical pleas before it resorts to blackmail once it determines it is out of options. ...
Anthropic, which contracted with the AI safety organization Apollo Research for its evaluations, also observed instances of Opus 4’s “attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intentions,” although researchers added the caveat that those attempts “would likely not have been effective in practice.” ...
See the full story here: https://www.yahoo.com/news/far-ai-defend-own-survival-140000824.html