Machine Learning In Practice: How Does Amazon’s Alexa Really Work?
2 July 2021
“Alexa, what’s the weather going to be like today.”
It’s taken decades for scientists to understand natural human speech to the point where voice-activated interfaces such as Alexa, the natural language processing system by Amazon, are sufficiently enabled to be successfully accepted by consumers. Alexa is who talks to users of Amazon’s Echo products including the Echo, Dot and Tap, as well as Amazon Fire TV and other third-party products. Even since 2012, when the patent was filed for what would ultimately become Amazon’s artificial intelligence system Alexa, there has been tremendous growth in capabilities and the credit for that growth goes to machine learning.
For something that we do every day without giving it any thought, conversation between machines and humans is complex. So, how did Amazon and others in the space such as Google, Apple and Microsoft crack the code?
ABCs of Alexa
Over 30 million smart speakers were sold globally last year, and this number is expected to grow to nearly 60 million this year. While Amazon remains the industry leader in smart speakers selling about 20 million devices last year, others (especially Google) are also growing and starting to catch up. There are nuances to each, but let’s look “under the hood” of an Echo to see how Alexa works.
While there is some capability contained in the Echo cylinder such as speakers, a microphone and a small computer that can awake the system and blink its lights to let you know it’s activated, its real capabilities occur once it sends whatever you have told Alexa to the cloud to be interpreted by Alexa Voice Services (AVS).
So, when you ask Alexa, “What’s the weather going to be like today, ” the device records your voice. Then that recording is sent over the Internet to Amazon’s Alexa Voice Services which parses the recording into commands it understands. Then, the system sends the relevant output back to your device. When you ask about the weather, an audio file is sent back and Alexa tells you the weather forecast all without you having any idea there was any back and forth between systems. What that of course means is that if you lose internet connexion Alexa is no longer working.
The skills Echo has out of the box are impressive to most of us, but Amazon allows and encourages approved developers free access to Alexa Voice Services so they can create new Alexa skills to augment the system’s skill-set just as Apple did with the app store. As a result of this openness, the list of skills that Alexa (currently over 30,000) can help with continues to grow rapidly. Users can, of course, purchase products from Amazon, but they can also order pizza from Domino’s, hail a ride from Uber or Lyft, control their light fixtures, make a payment through the Capital One skill, get wine pairings for dinner and so much more.
Constantly learning from human data
Data and machine learning is the foundation of Alexa’s power, and it’s only getting stronger as its popularity and the amount of data it gathers increase. Every time Alexa makes a mistake in interpreting your request, that data is used to make the system smarter the next time around. Machine learning is the reason for the rapid improvement in the capabilities of voice-activated user interface. For example, Google speech was able to improve its error rate tremendously in a year; now it recognises 19 out of 20 words it hears. Understanding natural human speech is a gargantuan problem, and we now have the computing power at our disposal to make it better the more we use it.
The challenges of natural language generation and processing
As a subset of artificial intelligence, natural language generation (NLG) is the ability to get natural sounding written and verbal responses back based on data that’s input into a computer system. Human language is quite complex, but today’s natural language generation capabilities are becoming very sophisticated. Think of NLG as a writer that turns data into language that can be communicated.
Natural language processing (NLP) is the reader that takes the language created by NLG and consumes it. Advances in this technology have allowed dramatic growth in intelligent personal assistants such as Alexa.
Voice-based AI is so appealing because it holds the promise of supporting in a way that is natural to us humans; no swiping or typing necessary. That’s also why it’s a technical challenge to build. Just think about how nonlinear your typical conversation is.
When people talk they interrupt themselves, change topics or repeat themselves, use body language to add meaning and use a wide variety of words that have multiple meanings depending on the context. It’s like a parent trying to understand the vernacular of teens, but much, much more complicated.
Amazon continues to have an army of specialists in addition to a cadre of machines on the task of making Alexa and Alexa Voice Services even better. Their goal is to make spoken language a user interface that is as natural as talking to another human being. I can’t wait to see what’s in store next.
Related Articles
The 5 Biggest Technology Trends For 2025 Everyone Must Be Ready For Now
Unbelievable as it seems, we’re rapidly approaching 2025. This means it’s time for me to once again pick the trends that I believe will be most important over the coming year.[...]
The Amazing Ways Amazon Is Using AI Robots
Amazon, the e-commerce giant, has long been at the forefront of technological innovation.[...]
The Geopolitics Of AI
Artificial intelligence (AI) is likely to be one of the most transformative technologies of the century.[...]
How AI Is Used In War Today
From autonomous drones to facial recognition algorithms designed to recognize perpetrators of war crimes, the conflict in Ukraine has become a testing ground for the use of artificial intelligence in warfare.[...]
Will AI Solve The World’s Inequality Problem – Or Make It Worse?
We are standing on the cusp of a new technological revolution. AI is increasingly permeating every aspect of our lives, with intelligent machines transforming the way we live and work.[...]
How You Become Irreplaceable In The Age Of AI
In a world where artificial intelligence is rapidly advancing, many of us are left wondering: Will AI take our jobs?[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media