CS 1501: AGI Lecture 11 | Concrete Problems in AI Safety

“The more I learn, the more I realize how much I don’t know.”

– Albert Einstein

Course Recap:

  1. From Amodei and Dario, et al [2] , accidents are defined as: unintended and harmful behavior that may emerge from poor design of real-world AI systems.

  2. One major class of AI accidents are unintended side effects, which can result from a poorly defined loss function, amongst other things.

  3. The second class of accidents involve reward hacking, and this is when the AI/AGI figures out a way to manipulate its reward function to get the reward without doing the intended task.

  4. The final class of AI accidents we discussed was scalable oversight, and that dealt with how an AI could interact with the world given expensive cost functions, and how to deal with this.

  5. Regulation is most likely going to be required, but how much? And by whom?

  6. What are some of the downsides and upsides of such regulation?

