How ‘radioactive data’ could help reveal malicious AIs
... What if it did have ulterior motives, though? That’s the question at the heart of a fascinating new paper I read this week, which offers a comprehensive analysis of how AI-generated text can and almost certainly will be used to spread propaganda and other influence operations — and offers some thoughtful ideas on what governments, AI developers, and tech platforms might do about it.
The paper is “Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations,” and it was written as a collaboration between Georgetown University’s Center for Security and Emerging Technology, and Stanford Internet Observatory, and AI. ...
The potential of language models to rival human-written content at low cost suggests that these models—like any powerful technology—may provide distinct advantages to propagandists who choose to use them. These advantages could expand access to a greater number of actors, enable new tactics of influence, and make a campaign’s messaging far more tailored and potentially effective. ...
The worst case scenario might look something like what the researcher Aviv Ovadya has called the infocalypse: an internet where ubiquitous synthetic media reduces societal trust to near zero, as no one is ever sure who created what they are looking at or why. ...
In the subfield of computer vision, researchers at Meta have demonstrated that images produced by AI models can be identified as AI-generated if they are trained on “radioactive data”—that is, images that have been imperceptibly altered to slightly distort the training process. This detection is possible even when as little as 1% of a model’s training data is radioactive and even when the visual outputs of the model look virtually identical to normal images. It may be possible to build language models that produce more detectable outputs by similarly training them on radioactive data; however, this possibility has not been extensively explored, and the approach may ultimately not work. ...
See the full story here: https://www.theverge.com/23553406/radioactive-data-malicious-ai-detection
Pages
- About Philip Lelyveld
- Mark and Addie Lelyveld Biographies
- Presentations and articles
- Tufts Alumni Bio