Did OpenAI Sora Just Kickstart The Era Of Generative Video?
4 March 2024
Just a few weeks back, I wrote that we are probably still some way from being able to create a movie from a natural language prompt.
Now, it seems that it may happen a lot sooner than I suspected. OpenAI – creator of ChatGPT, the chatbot that started the current generative AI craze -just announced its own text-to-video model, Sora.
To say the results have stunned the AI community is an understatement. Although we can’t yet use it for ourselves, videos demonstrate a close-to-photorealistic sequence of a woman walking in a city and a goldrush-era US town, generated from simple text prompts.
But generative video – while undoubtedly technically amazing – creates ethical and societal challenges that go beyond those posed by the automated creation of text, images and sounds.
So, let’s take a look at what it is, what it does, and perhaps most importantly, what it means for a world in which it will inevitably become more and more difficult to tell the difference between the real and the digitally generated.
So What Is Sora?
Basically, Sora is to video what ChatGPT is to writing, and Dall-E 3 is to image generation. You type what you want to see, and it appears, in full motion, in front of your eyes.
None of the videos that have been shown as of yet have any sound, but given advances in AI sound and music generation, we can only assume that this will be coming soon.
Generative AI video creators aren’t entirely new. I’ve outlined a number of them that have appeared in the last year or so in the piece I linked to at the start of this article. Mostly, though, while they generate text, overlays and effects, they don’t produce actual video animation. However, there are a few exceptions, like Runway.
At this early stage, impressive though it is, it isn’t going to give us the next Toy Story from a prompt. But the potential is virtually unlimited. Filmmakers can use it to visualize concepts and scenes or generate special effects. Teachers can create immersive historical recreations, and manufacturers can use it to create prototypes and demonstrations.
At the moment, Sora can generate videos up to one minute long. And it’s more than simple image generation (if we have to think of that as simple now) creating a set of consecutive images to give the impression of movement; it’s capable of tracking the positioning of objects so they move realistically and coherently with other objects, moving in front or behind of them, for example.
It can even perform complicated operations like “remembering” objects when they move off-camera so they will be recreated accurately when they move back into view.
It isn’t perfect, of course, and OpenAI admits that it will generate inconsistencies, such as objects that don’t follow the laws of physics or causality.
But from what we’ve seen, it’s an amazing technology that gives a tantalizing glimpse of what we will soon be able to do!
How Does It Work?
Like Dall-E and other image generators, Sora is essentially a diffusion model, meaning it creates images from random “noise” and gradually de-randomizes them by transforming them into an image that matches their prompt.
Over thousands or tens of thousands of steps, the images that make up the video become more defined.
What really makes it special is the ability to understand how the objects – people or anything else – in the setting would realistically interact with everything else. This could mean water making things wet when they move through it or a ball falling and moving across the floor in a realistic way when it’s dropped.
Just as ChatGPT understands words from their context, learning how they fit together with other words to communicate meaning, Sora understands how things act and behave in real-world settings. OpenAI hasn’t given details of what data it’s trained on, but it’s likely to be many, many hours of real-world video footage from which it can learn how items, people, animals, and scenery move and interact.
As well as generating entirely new footage, it can continue an existing video and recreate existing footage from new angles.
Is The World Ready For Generative Video On-Demand?
Sora offers amazing possibilities. But empowering anyone to create realistic videos of anything they want will clearly not be without dangers.
Scams and phishing attacks could become more sophisticated, for example, by using deepfake videos to make fraudulent activities seem more legitimate or plausible. We’ve already seen this with AI voiceovers overlaid on footage of celebrities to create the impression they are giving their endorsement.
It will inevitably also become easier to create non-consensual videos with convincing likenesses of real people, which could be used to cause harm or for blackmail.
I am sure that we will also see it used in attempts to subvert democratic processes and spread fake news and disinformation, with the aim of undermining trust in politicians, governments, or institutions.
OpenAI tells us it has built safeguards into its algorithms in order to prevent many of these uses and is also developing its own tools to help identify harmful content. But as we’ve seen with ChatGPT, it’s highly likely that workarounds for these will be found, or copycat products will emerge without safeguards in place.
Addressing these issues will require a concerted effort involving education, legislation and the adoption of robust frameworks around responsible, ethical AI use. Sadly, as has been the case with every transformative technology from mechanization to the automobile and computing, it seems inevitable that some harm will be caused.
But the genie is now very much out of the bottle, meaning it’s down to responsible AI users and advocates to ensure society manages these risks effectively while also allowing its transformative potential to be realized.
Related Articles
The 12 Best Smart Home Devices Transforming Homes in 2025
By now, “smart” versions exist of just about every home appliance, gadget and gizmos we can think of. However, manufacturers continue[...]
11 Most Reliable AI Content Detectors: Your Guide To Spotting Synthetic Media
Since the launch of ChatGPT just two years ago, the volume of synthetic – or fake – content online has increased exponentially.[...]
The AI-Powered Citizen Revolution: How Every Employee Is Becoming A Technology Creator
Something remarkable is happening in organizations around the world.[...]
6 Mistakes IT Teams Are Guaranteed To Make In 2025
The next wave of artificial intelligence isn't just knocking at enterprise doors - it's exposing fundamental flaws in how organizations approach technology transformation.[...]
2025’s Tech Forecast: The Consumer Innovations That Will Matter Most
Consumer technology covers all of the tech we buy to make our lives more convenient, productive or fun.[...]
7 Healthcare Trends That Will Transform Medicine In 2025
Healthcare has evolved dramatically in recent years, with technology driving countless new opportunities, just as demographic and societal factors have created new challenges.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media