In artificial intelligence (AI), the “alignment problem” refers to the challenges caused by the fact that machines simply do not have the same values as us. In fact, when it comes to values, then at a fundamental level, machines don’t really get much more sophisticated than understanding that 1 is different from 0.
As a society, we are now at a point where we are starting to allow machines to make decisions for us. So how can we expect them to understand that, for example, they should do this in a way that doesn’t involve prejudice towards people of a certain race, gender, or sexuality? Or that the pursuit of speed, or efficiency, or profit, has to be done in a way that respects the ultimate sanctity of human life?
Theoretically, if you tell a self-driving car to navigate from point A to point B, it could just smash its way to its destination, regardless of the cars, pedestrians, or buildings it destroys on its way.
Similarly, as Oxford philosopher Nick Bostrom outlined, if you tell an intelligent machine to make paperclips, it might eventually destroy the whole world in its quest for raw materials to turn into paperclips. The principle is that it simply has no concept of the value of human life or materials or that some things are too valuable to be turned into paperclips unless it is specifically taught it.
This forms the basis of the latest book by Brian Christian, The Alignment Problem – How AI Learns Human Values. It’s his third book on the subject of AI following his earlier works, The Most Human Human and Algorithms to Live By. I have always found Christian’s writing enjoyable to read but also highly illuminating, as he doesn’t worry about getting bogged down with computer code or mathematics. But that’s certainly not to say it is in any way lightweight or not intellectual.
Rather, his focus is on the societal, philosophical, and psychological implications of our ever-increasing ability to create thinking, learning machines. If anything, this is the aspect of AI where we need our best thinkers to be concentrating their efforts. The technology, after all, is already here – and it’s only going to get better. What’s far less certain is whether society itself is mature enough and has sufficient safeguards in place to make the most of the amazing opportunities it offers - while preventing the serious problems it could bring with it from becoming a reality.
I recently sat down with Christian to discuss some of the topics. Christian’s work is particularly concerned with the encroachment of computer-aided decision-making into fields such as healthcare, criminal justice, and lending, where there is clearly potential for them to cause problems that could end up affecting people’s lives in very real ways.
“There is this fundamental problem … that has a history that goes back to the 1960s, and MIT cyberneticist Norbert Wiener, who likened these systems to the story of the Sorcerer’s Apprentice,” Christian tells me.
Most people reading this will probably be familiar with the Disney cartoon in which Mickey Mouse attempts to save himself the effort of doing his master’s chores by using a magic spell to imbue a broom with intelligence and autonomy. The story serves as a good example of the dangers of these qualities when they aren't accompanied by human values like common sense and judgment.
“Wiener argued that this isn’t the stuff of fairytales. This is the sort of thing that’s waiting for us if we develop these systems that are sufficiently general and powerful … I think we are at a moment in the real world where we are filling the world with these brooms, and this is going to become a real issue.”
One incident that Christian uses to illustrate how this misalignment can play out in the real world is the first recorded killing of a pedestrian in a collision involving an autonomous car. This was the death of Elaine Herzberg in Arizona, US, in 2018.
When the National Transportation Safety Board investigated what had caused the collision between the Uber test vehicle and Herzberg, who was pushing a bicycle across a road, they found that the AI controlling the car had no awareness of the concept of jaywalking. It was totally unprepared to deal with a person being in the middle of the road, where they should not have been.
On top of this, the system was trained to rigidly segment objects in the road into a number of categories – such as other cars, trucks, cyclists, and pedestrians. A human being pushing a bicycle did not fit any of those categories and did not behave in a way that would be expected of any of them.
“That’s a useful way for thinking about how real-world systems can go wrong,” says Christian, “It’s a function of two things – the first is the quality of the training data. Does the data fundamentally represent reality? And it turns out, no – there’s this key concept called jaywalking that was not present.”
The second factor is our own ability to mathematically define what a system such as an autonomous car should do when it encounters a problem that requires a response.
“In the real world, it doesn't matter if something is a cyclist or a pedestrian because you want to avoid them either way. It's an example of how a fairly intuitive system design can go wrong."
Christian’s book goes on to explore these issues as they relate to many of the different paradigms that are currently popular in the field of machine learning, such as unsupervised learning, reinforcement learning, and imitation learning. It turns out that each of them presents its own challenges when it comes to aligning the values and behaviors of machines with the humans who are using them to solve problems.
Sometimes the fact that machine learning attempts to replicate human learning is the cause of problems. This might be the case when errors in data mean the AI is confronted with situations or behaviors that would never be encountered in real life, by a human brain. This means there is no reference point, and the machine is likely to continue making more and more mistakes in a series of "cascading failures."
In reinforcement learning – which involves training machines to maximize their chances of achieving rewards for making the right decision – machines can quickly learn to “game” the system, leading to outcomes that are unrelated to those that are desired. Here Christian uses the example of Google X head Astro Teller's attempt to incentivize soccer-playing robots to win matches. He devised a system that rewarded the robots every time they took possession of the ball – on the face of it, an action that seems conducive to match-winning. However, the machines quickly learned to simply approach the ball and repeatedly touch it. As this meant they were effectively taking possession of the ball over and over, they earned multiple rewards – although it did little good when it came to winning the match!
Christian’s book is packed with other examples of this alignment problem – as well as a thorough exploration of where we are when it comes to solving it. It also clearly demonstrates how many of the concerns of the earliest pioneers in the field of AI and ML are still yet to be resolved and touches on fascinating subjects such as attempts to imbue machines with other characteristics of human intelligence such as curiosity.