Artificial Intelligence Will Do What We Ask. That’s a Problem.
The danger of having artificially intelligent machines do our bidding is that we might not be careful enough about what we wish for. The lines of code that animate these machines will inevitably lack nuance, forget to spell out caveats, and end up giving AI systems goals and incentives that don’t align with our true preferences.
As a result, research suggests, YouTube’s algorithm has been helping to polarize and radicalize people and spread misinformation, just to keep us watching.
YouTube’s engineers probably didn’t intend to radicalize humanity. But coders can’t possibly think of everything.
A major aspect of the problem is that humans often don’t know what goals to give our AI systems, because we don’t know what we really want.
Asking a machine to optimize a “reward function” — a meticulous description of some combination of goals — will inevitably lead to misaligned AI, Russell argues, because it’s impossible to include and correctly weight all goals, subgoals, exceptions and caveats in the reward function, or even know what the right ones are. Giving goals to free-roaming, “autonomous” robots will be increasingly risky as they become more intelligent, because the robots will be ruthless in pursuit of their reward function and will try to stop us from switching them off.
Instead of machines pursuing goals of their own, the new thinking goes, they should seek to satisfy human preferences; their only goal should be to learn more about what our preferences are.
- The machine’s only objective is to maximize the realization of human preferences.
- The machine is initially uncertain about what those preferences are.
- The ultimate source of information about human preferences is human behavior.
The approach pins the success of robots on their ability to understand what humans really, truly prefer — something that the species has been trying to figure out for some time.
There was just one question: “If the obligation of machines is to try to optimize that aggregate quality of human experience, how on earth would they know what that was?”
Russell theorized that our decision-making is hierarchical — we crudely approximate rationality by pursuing vague long-term goals via medium-term goals while giving the most attention to our immediate circumstances.
“It turns out that uncertainty about the objective is essential for ensuring that we can switch the machine off,” Russell wrote in Human Compatible, “even when it’s more intelligent than us.”

Stuart Russell speaks at TED2017 - The Future You, April 24-28, 2017, Vancouver, BC, Canada. Photo: Bret Hartman / TED
Russell sees two major challenges. “One is the fact that our behavior is so far from being rational that it could be very hard to reconstruct our true underlying preferences,” he said.
The second challenge is that human preferences change.
... However, there’s a third major issue that didn’t make Russell’s short list of concerns: What about the preferences of bad people? What’s to stop a robot from working to satisfy its evil owner’s nefarious ends? AI systems tend to find ways around prohibitions just as wealthy people find loopholes in tax laws, so simply forbidding them from committing crimes probably won’t be successful.
See the full long article here: https://www.quantamagazine.org/artificial-intelligence-will-do-what-we-ask-thats-a-problem-20200130/
Pages
- About Philip Lelyveld
- Mark and Addie Lelyveld Biographies
- Presentations and articles
- Trustworthy AI – A Market-Driven approach
- Tufts Alumni Bio