Which is why it's always interesting to see new updates and capabilities added. The latest updates, however, are particularly thought-provoking.
We know that progressing towards general AI – AI that can do anything – is part of developer OpenAI’s plan. And when it comes to hitting that ambitious goal, the sense of sight and ability to speak and listen is pretty important. So, it makes sense that as we approach the first anniversary of ChatGPT’s being available to everyone, it’s getting these abilities.
But how will this affect our relationship with our new digital friend? What will it add to its ability to slot into our lives and help us with everyday challenges, and what does it mean for the big questions around ChatGPT’s (and AI’s in general) place in society?
What Are The New ChatGPT Updates?
Firstly, by gaining the ability to analyze and extract language information from images, ChatGPT is effectively gaining the ability to see. You can simply upload a picture and have it describe what is shown, as well as potentially use it to carry out far more complex tasks such as diagnosing how to repair broken machinery like bicycles or lawnmowers.
This means that ChatGPT can, in theory, not only analyze photographs but charts and visualizations, handwriting and all manner of unstructured data from the world around us.
Clearly, it has many everyday uses, from creating Facebook Marketplace listings for items you want to sell to turning whiteboard scribblings into easy-to-read notes.
Examples given by OpenAI themselves of how it can be used include snapping items in a fridge and asking it what you can make for dinner and having a live conversation about a photograph.
According to the New York Times, however, the functionality deployed in ChatGPT has limitations, some by design – being restricted in the way it can be used to analyze human faces, for example. This is done to stop it from being used to violate privacy and is in line with the way that OpenAI has previously limited its products.
The other change with potentially far-reaching consequences is that ChatGPT can now speak and listen if you're using the mobile app, at least.
So, the voice revolution has made talking to machines like Siri and Alexa pretty normal these days. And we’re all pretty used to the fact that they can only respond to us in a limited number of ways, and generally speaking, the most useful thing they can do is switch other devices on and off.
Which is why talking to ChatGPT is such an intriguing idea. ChatGPT can potentially engage in far more natural, flowing conversations, easily well enough to give the illusion that you’re talking to a real person.
And as well as holding a back-and-forth conversation, it can simulate voices it hears, which could be used for voicing an AI avatar, for example. As well as a number of other more sinister purposes
I say potentially because when I tried it out straight after launch, it has to be said it's not quite there yet.
ChatGPT had difficulty understanding what I said a few times. And even more strangely, it seems like its training data hasn’t been updated to let it know it can speak. When I asked it for help using its voice functionality, it told me firmly that it didn’t have voice functionality (in a perfectly synthesized human voice). I’ve also seen reports that it can have trouble understanding various accents and dialects.
What Does This Mean?
Moving into an era where machines can not only think but see, hear and talk too is clearly pretty momentous stuff. I’m sure there are plenty of people ready to say that it can’t really do any of those things particularly well yet. But it’s obvious that things are only just getting started.
Vision and voice functionality mean we're likely to see the ChatGPT technology appearing in more and more portable technology. We already have a pair of glasses that superimpose ChatGPT in front of your eyes, so you're never stuck for a solution to a problem. And these ones help you with making casual small talk.
Some of them can already listen and talk – though these use third-party extensions to add the functionality, and OpenAI's own integrated technology should (potentially) provide a much smoother experience.
But having real-time, AI-powered image analysis available to us instantly, wherever we are, could be a real game-changer in many fields.
It does raise some ethical considerations, though. Most pressingly, it’s worth bearing in mind that even though ChatGPT launched with a range of limiters on its behavior that were supposed to prevent it from being used for unethical purposes, these were quickly circumvented and, in some cases, removed entirely.
If this is done with ChatGPT's visual capabilities, the potential ramifications could be even more severe, particularly if unethical actors find a way around the block on facial recognition.
It also wasn’t long after the release of ChatGPT that copycat versions that work just like it but without the limitations started to appear. Sometimes, these were sold by their creators specifically as tools designed to break laws. Could we see the same happen with the visual or voice-mimicking abilities? I think it would be pretty foolish to think that it won’t.
The Quest For Artificial General Intelligence
It's possible that there's something that should be worrying us even more than that, though.
With its latest set of updates, ChatGPT is becoming increasingly multimodal. This means it can understand and interact with various forms of input, like pictures and sounds, rather than just words.
This is important because the aim of AI development is inevitably artificial general intelligence (AGI). This is a term for machines that can perform any task as long as they have the necessary data, much like we as humans can. Becoming multimodal could easily be described as taking a big step towards this.
It’s still probably safe to say AGI is a way off. Google engineering director Ray Kurzweil has estimated we’ll get there around 2045, and DeepMind CEO Demis Hassabis also believes it will be done in the next few decades.
Philosopher Nick Bostrom, however, believes “superintelligence” will arrive early in the next century. AI pioneer and founder of the Center for Human Compatible AI, Professor Stuart Russell, says it's still some way off, and there are major problems we can't yet solve.
When it does arrive, AGI is likely to have a pretty enormous impact. On the question of whether it will lead to us living lives of luxury while machines create everything we need or a far darker fate, opinion remains divided.
So What Now?
For better or worse, governments or those in a position to make the decisions don’t seem to have heeded the advice of those who signed the Pause Giant AI Experiments petition.
This means that we’re likely to see the development of AI continue and accelerate. Functionality like that added by ChatGPT will become an everyday part of life. It will also become more reliable, more powerful and offer a continually improving user experience. This means more applications and devices with the technology built-in and an ever-growing list of social and industrial use cases.
ChatGPT can see and hear now, so I don’t think it will be long before someone will work out how to make it touch, smell and taste. Then, it will be equipped with all the same sensory functions we have and, in theory, capable of fully understanding the way we perceive our environment.
This will give it the potential to help us unearth a great many insights - information about the world and our interaction with it that lies beyond the reach of our organic brains. And it will give us that information in ways it knows we can use due to its understanding of our own faculties.
With AI, we’re on a journey where the destination is far from certain. But advances like those made by ChatGPT in the (less than) year since it was launched make two things clear: We’re gathering speed, and we can only guess what surprise will be around the next corner.