The Data That Powers A.I. Is Disappearing Fast
...
Yacine Jernite, a machine learning researcher at Hugging Face, a company that provides tools and data to A.I. developers, characterized the consent crisis as a natural response to the A.I. industry’s aggressive data-gathering practices.
“Unsurprisingly, we’re seeing blowback from data creators after the text, images and videos they’ve shared online are used to develop commercial systems that sometimes directly threaten their livelihoods,” he said.
But he cautioned that if all A.I. training data needed to be obtained through licensing deals, it would exclude “researchers and civil society from participating in the governance of the technology.”
...
A.I. companies have claimed that their use of public web data is legally protected under fair use. But gathering new data has gotten trickier. Some A.I. executives I’ve spoken to worry about hitting the “data wall” — their term for the point at which all of the training data on the public internet has been exhausted, and the rest has been hidden behind paywalls, blocked by robots.txt or locked up in exclusive deals. ...
But there’s also a lesson here for big A.I. companies, who have treated the internet as an all-you-can-eat data buffet for years, without giving the owners of that data much of value in return. Eventually, if you take advantage of the web, the web will start shutting its doors. ...
See the full story here: https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html
Pages
- About Philip Lelyveld
- Mark and Addie Lelyveld Biographies
- Presentations and articles
- Tufts Alumni Bio