Because not much has been written about efficient data deletion, the Stanford authors first aimed to define the problem and describe four design principles that would help ameliorate it. The first principle is “linearity”: Simple AI models that just add and multiply numbers, avoiding so-called nonlinear mathematical functions, are easier to partially unravel. The second is “laziness,” in which heavy computation is delayed until predictions need to be made. The third is “modularity”: If possible, train a model in separable chunks and then combine the results. The fourth is “quantization,” or making averages lock onto nearby discrete values so removing one contributing number is unlikely to shift the average.
Certain AI methods aim to anonymize records, but there are reasons one might want AI to forget individual data points besides privacy, Guan says. Some people might not want to contribute to the profits of a disliked company—at least without profiting from their own data themselves. Or scientists might discover problems with data points post-training. (For instance, hackers can “poison” a dataset by inserting false records.) In both cases, efficient data deletion would be valuable.
“We certainly don’t have a full solution,” Guan says. “But we thought it would be very useful to define the problem. Hopefully people can start designing algorithms with data protection in mind.”
See the full story here: https://spectrum.ieee.org/tech-talk/computing/software/researchers-can-make-ai-forget-you