Anthropic and OpenAI Report Findings of Joint AI Safety Tests

2Sep/25Off

Anthropic and OpenAI Report Findings of Joint AI Safety Tests

OpenAI and Anthropic — rivals in the AI space who guard their proprietary systems — joined forces for a misalignment evaluation, safety testing each other’s models to identify when and how they fall short of human values. Among the findings: reasoning models including Anthropic’s Claude Opus 4 and Sonnet 4, and OpenAI’s o3 and o4-mini resist jailbreaks, while conversational models like GPT-4.1 were susceptible to prompts or techniques intended to bypass safety protocols. Although the test results were unveiled as users complain chatbots have become overly sycophantic, the tests were “primarily interested in understanding model propensities for harmful action,” per OpenAI. ...

The conditions were not intended to recreate real-world situations, but were aimed at understanding “the most concerning actions that these models might try to take when given the opportunity,” Anthropic reports in its thorough findings post. ...

With regard to hallucinating, Anthropic’s Claude models “refused to answer up to 70 percent of questions when they were unsure of the correct answer,” while “OpenAI’s o3 and o4-mini models refuse to answer questions far less, but showed much higher hallucination rates, attempting to answer questions when they didn’t have enough information,” explains TechCrunch. ...

See the full story here: https://www.etcentric.org/anthropic-and-openai-report-findings-of-joint-ai-safety-tests/

Filed under: Non-3D stories Comments Off

Comments (0) Trackbacks (0) ( subscribe to comments on this post )

Sorry, the comment form is closed at this time.

Trackbacks are disabled.

Your AI Assistant Might Have a Vanity Problem » « The AI Summit Where Everyone Agreed on Bad News

Pages

If your company is an ETC member, you can log in and see more news posts at www.etcentric.org

philip lelyveld The world of entertainment technology

Anthropic and OpenAI Report Findings of Joint AI Safety Tests

Pages

More posts