AI with an Undo « philip lelyveld

27Nov/17Off

AI with an Undo

Getting AI to be more cautious: Where do we go next, and can we change our minds if we don't like it?

The idea is to have an agent jointly learn a forward policy and a reset policy. The forward policy maximizes the task reward, and the reset policy tries to figure out actions to take to reset the environment to a prior state. This leads to agents that learn to avoid risky actions that could irrevocably commit them to something.
"Before an action proposed by the forward policy is executed in the environment, it must be “approved” by the reset policy. In particular, if the reset policy’s Q value for the proposed action is too small, then an early abort is performed: the proposed action is not taken and the reset policy takes control," they write.

See the full story here: https://us13.campaign-archive.com/?u=67bd06787e84d73db24fb0aa5&id=43052be12b&utm_source=MIT+Technology+Review&utm_campaign=edc326f0a0-The_Download&utm_medium=email&utm_term=0_997ed6f472-edc326f0a0-153894145

Filed under: Non-3D stories Comments Off

Comments (0) Trackbacks (0) ( subscribe to comments on this post )

Sorry, the comment form is closed at this time.

Trackbacks are disabled.

SingularityNet: Converging AI & Blockchain for the greater good » « Quantum encryption is now fast enough for voice calls

Pages

If your company is an ETC member, you can log in and see more news posts at www.etcentric.org

philip lelyveld The world of entertainment technology

AI with an Undo

Pages

More posts