Why Are Large AI Models Being Red Teamed?
Intelligent systems demand more than just repurposed cybersecurity tools
In February, OpenAI announced the arrival of Sora, a stunning “text-to-video” tool. Simply enter a prompt, and Sora generates a realistic video within seconds. But it wasn’t immediately available to the public. Some of the delay is because OpenAI reportedly has a set of experts called a red team who, the company has said, will probe the model to understand its capacity for deepfake videos, misinformation, bias, and hateful content.
Red teaming, while having proved useful for cybersecurity applications, is a military tool that was never intended for widespread adoption by the private sector.
“Done well, red teaming can identify and help address vulnerabilities in AI,” says Brian Chen, director of policy from the New York–based think tank Data & Society. “What it does not do is address the structural gap in regulating the technology in the public interest.” ...
The purpose of red-teaming exercises is to play the role of the adversary (the red team) and find hidden vulnerabilities in the defenses of the blue team (the defenders) who then think creatively about how to fix the gaps. ...
Zenko also reveals a glaring mismatch between red teaming and the pace of AI advancement. The whole point, he says, is to identify existing vulnerabilities and then fix them. “If the system being tested isn’t sufficiently static,” he says, “then we’re just chasing the past.” ...
Dan Hendrycks, executive and research director of the San Francisco–based Center for AI Safety, says red teaming shouldn’t be treated as a turnkey solution either. “The technique is certainly useful,” he says. “But it represents only one line of defense against the potential risks of AI, and a broader ecosystem of policies and methods is essential.”
NIST’s new AI Safety Institute now has an opportunity to change the way red teaming is used in AI. The Institute’sconsortium of more than 200 organizations has already reportedly begun developing standards for AI red teaming. Tech developers have also begun exploring best practices on their own. For example, Anthropic, Google, Microsoft, and OpenAI have established the Frontier Model Forum (FMF) to develop standards for AI safety and share best practices across the industry.
See the full story here: https://spectrum.ieee.org/red-team-ai-llms
Pages
- About Philip Lelyveld
- Mark and Addie Lelyveld Biographies
- Presentations and articles
- Tufts Alumni Bio