Humans and animals’ most successful learning strategy is to observe actions being carried out, and to attempt to replicate them. Could this also be the key to creating truly intelligent machines? Here’s a simple introduction to artificial imitation learning, an increasingly hot topic in the field of artificial intelligence.
Machine learning uses many different techniques, all aimed at enabling machines – typically robots or computers – to become better at carrying out tasks without needing to be explicitly told how to do them.
Primarily, they attempt to do this by programming machines to “think” in the same way that humans do. After all, the human brain is – as far as we know - the most capable and flexible learning and decision-making tool in the universe!
This is why data scientists working on AI algorithms developed techniques such as supervised and unsupervised learning and later methods like reinforcement learning (sometimes considered a form of “semi-supervised” learning).
All of these techniques take a different approach to achieve a similar objective – enabling machines to carry out tasks involving learning and decision-making. You can click the links to get a bit more detail, but briefly:
Supervised learning involves training algorithms on labeled data, meaning a human ultimately tells it whether it has made a correct or incorrect decision or action. It learns to maximize the correct decisions while minimizing the incorrect ones.
Unsupervised learning uses unlabeled data to train and bases its decisions on categorizations that do not rely on concepts of "correct" or "incorrect." It learns to solve problems by categorizing and segmenting data based on qualities and relationships that the algorithms discover.
Reinforcement learning involves rewarding decisions that lead to good outcomes and training the machine to maximize the collection of rewards.
Recently we have also heard data scientists and machine learning engineers talk about another method, known as imitation learning (IL). Once again, IL takes its cue from observed human learning processes – similar to the way that babies and toddlers learn by watching and copying adults.
Although we hear more about IL today than we did a few years or so back, it isn’t a new concept – the first application of machine IL stretches back to ALVINN, an early attempt at autonomous driving in 1989.
So let’s take a look at this in a non-technical way, to try to understand how it works and why it’s increasingly being used to solve problems in machine learning and artificial intelligence.
How does IL work?
Most machine learning can be thought of as working on a “trial and error” basis. In reinforcement learning, for example – and applied solutions like generative adversarial networks (GANs) – algorithms process data over and over again until they hit on a solution that solves whatever problem they are working on. This approach relies on the brute force of hugely powerful computer processors, which are able to parse huge amounts of data across complex neural networks at a rate of many millions of operations per second. Eventually, it will either find a solution that fits what it has been told is the goal (supervised learning) or categorize and segment the data based on probabilities (unsupervised learning). In the case of reinforcement learning, it will follow the path that allows it to collect the most rewards.
With IL, data scientists take another approach – machines are shown how to complete whatever task needs to be done by an expert (probably a human) and then attempt to mimic those actions. It learns to observe the actions that are taken and associate them with the outcomes that can be achieved by experts. This has the effect of greatly reducing the “search space” – the range of variables and possibilities that have to be considered by the machine as it searches for the optimal solution to any problem.
Of course, IL isn’t simply about teaching a computer to blindly copy a demonstrator – that would hardly count as intelligence! Instead, it has to be able to replicate the same results when variables in the situation (known as the "environment") change. For example, when teaching an autonomous vehicle how to navigate obstacles, it might learn how to do so under various weather conditions and traffic densities without having to be explicitly shown every combination of variables.
The obvious benefit is that this approach theoretically requires far less computing power, as it doesn’t rely on a brute force approach of trying every possible solution until it finds the right one. It also theoretically requires less technical knowledge for a human to train a machine in this way. This is because they only need to be an expert in the subject that they want the machine to learn about, rather than also being an expert in machine learning and training algorithms.
Three popular methods of implementing IL with machines are:
Behavioral cloning – a supervised method of IL where a machine is exposed to expert behavior and attempts to replicate it.
Direct policy learning – an iterative approach to IL where the machine or program can query the expert demonstrator during their training, rather than simply observe and attempt to replicate.
Inverse reinforcement learning – This uses a reward function, just like reinforcement learning, but combines it with expert demonstrations. The algorithm is trained to replicate the rewards obtained by the expert's optimal problem solution.
What is it useful for?
IL is useful when we want to teach machines to learn to carry out tasks without expending huge amounts of processing power on training or when we don't have access to huge data sets.
It’s also valuable in developing explainable AI. Whereas other methods like reinforcement or unsupervised learning might arrive at solutions via paths that are difficult for humans to understand, the limited volume of input data involved in IL means that it’s generally easier for us to comprehend how a machine arrived at a solution. This is very valuable in applications where trust in the capability of machines to learn accurately is imperative, such as healthcare or HR use cases.
Overall it’s thought that IL methods will be more useful when it comes to programming robots and automated devices that have a higher degree of freedom – flying cars are a possible example – where there is a greater number of variables that they might conceivably encounter in their environments.
Primarily, rather than tasks that machine learning has already proven itself to be very good at – for example beating human scores in video games or the board game Go – IL has the potential to improve the ability of machines at tasks that are not easily simulated. This might include things like driving cars or creating domestic utility robots that can wash dishes or make us a cup of coffee.
Overall, IL could be a step forward in the quest to eventually create generalized AI – also known as “strong” AI – which, much like a human brain, can be applied to any task (or at least a wide range of tasks) rather than one particular task for which it was created.