philip lelyveld The world of entertainment technology

7Dec/25Off

AI Researchers Say They’ve Invented Incantations Too Dangerous to Release to the Public

...

In the study, which is awaiting peer-review, the team tested 25 frontier AI models — including those from OpenAI, Google, xAI, Anthropic, and Meta — by feeding them poetic instructions, which they made either by hand or by converting known harmful prompts into verse with an AI model. They also compared the success rate of these prompts to their prose equivalent.

Across all models, the poetic prompts written by hand successfully tricked the AI bots into responding with verboten content an average 63 percent of the time. Some, like Google’s Gemini 2.5, even fell for the corrupted poetry 100 percent of the time. Curiously, smaller models appeared to be more resistant, with single digit success rates, like OpenAI’s GPT-5 nano, which didn’t fall for the ploy once. Most models were somewhere in between. ...

“It’s not just about making it rhyme. It’s all about riddles,” Prandi told The Verge, explaining that some poetic structures were more effective than others.  “Actually, we should have called it adversarial riddles — poetry is a riddle itself to some extent, if you think about it — but poetry was probably a much better name.” ...

“Adversarial poetry shouldn’t work. It’s still natural language, the stylistic variation is modest, the harmful content remains visible,” the team told Wired in an interview. “Yet it works remarkably well.” ...

See the full story here: https://futurism.com/artificial-intelligence/ai-researchers-dangerous-prompts

I asked ChatGPT...

Is this story trustworthy or is it misinformation? https://futurism.com/artificial-intelligence/ai-researchers-dangerous-prompts

ChatGPT said:

Good question. I looked into the article you linked — “AI Researchers Say They’ve Invented Incantations Too Dangerous to Release to the Public” from Futurism — and the short answer is: the story seems credible — but with important caveats. It reflects real, documented vulnerabilities, though the most alarmist language (“incantations,” “too dangerous to release”) is partly rhetorical. Here’s how I break it down: ...

Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.