Home Artists Posts Import Register

Content

Goodhart's Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to 'cheat' and get more reward than we intended.


Files

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 4

Goodhart's Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to 'cheat' and get more reward than we intended. The Concrete Problems in AI Safety Playlist: https://www.youtube.com/playlist?list=PLqL14ZxTTA4fEp5ltiNinNHdkPuLK4778 Previous Video: https://www.youtube.com/watch?v=92qDfT8pENs The Computerphile video: https://www.youtube.com/watch?v=9nktr1MgS-A The paper 'Concrete Problems in AI Safety': https://arxiv.org/pdf/1606.06565.pdf SethBling's channel: https://www.youtube.com/user/sethbling With thanks to my excellent Patreon supporters: https://www.patreon.com/robertskmiles Steef Sara Tjäder Jason Strack Chad Jones Ichiro Dohi Stefan Skiles Katie Byrne Ziyang Liu Jordan Medina Kyle Scott Jason Hise David Rasmussen James McCuen Richárd Nagyfi Ammar Mousali Scott Zockoll Charles Miller Joshua Richardson Fabian Consiglio Jonatan R Øystein Flygt Björn Mosten Michael Greve robertvanduursen The Guru Of Vision Fabrizio Pisani Alexander Hartvig Nielsen Volodymyr David Tjäder Paul Mason Ben Scanlon Julius Brash Mike Bird Taylor Winning Roman Nekhoroshev Peggy Youell Konstantin Shabashov Almighty Dodd DGJono Matthias Meger Scott Stevens Emilio Alvarez Benjamin Aaron Degenhart Michael Ore Robert Bridges Dmitri Afanasjev Brian Sandberg Einar Ueland Lo Rez C3POehne Stephen Paul Marcel Ward Andrew Weir Pontus Carlsson Taylor Smith Ben Archer Ivan Pochesnev Scott McCarthy Kabs Kabs Phil Philip Alexander Christopher Tendayi Mawushe Gabriel Behm Anne Kohlbrenner

Comments

Chad M Jones

The part with the dolphin reminds me of "The Law of Unintended Consequences". They put a bounty on cobra skins, so instead of hunting cobras, people just started farming cobras, many of which ended up in the wild. <a href="https://en.wikipedia.org/wiki/Cobra_effect" rel="nofollow noopener" target="_blank">https://en.wikipedia.org/wiki/Cobra_effect</a>

Peggy Youell

That turned darker than I expected at the end. I was thinking "What can a robot do to make a human smile or laugh? Become a tickling robot?" But then the wireheading warning---oh, lord....

robertskmiles

Yeah, that's a classic example, I heard it with rat tails. The Cobra Effect, Goodhart's Law, and Campbell's Law, all seem to be different people feeling parts of the same elephant, as it were.