Did OpenAI Sora Just Kickstart The Era Of Generative Video?
4 March 2024
Just a few weeks back, I wrote that we are probably still some way from being able to create a movie from a natural language prompt.
Now, it seems that it may happen a lot sooner than I suspected. OpenAI – creator of ChatGPT, the chatbot that started the current generative AI craze -just announced its own text-to-video model, Sora.
To say the results have stunned the AI community is an understatement. Although we can’t yet use it for ourselves, videos demonstrate a close-to-photorealistic sequence of a woman walking in a city and a goldrush-era US town, generated from simple text prompts.
But generative video – while undoubtedly technically amazing – creates ethical and societal challenges that go beyond those posed by the automated creation of text, images and sounds.
So, let’s take a look at what it is, what it does, and perhaps most importantly, what it means for a world in which it will inevitably become more and more difficult to tell the difference between the real and the digitally generated.
So What Is Sora?
Basically, Sora is to video what ChatGPT is to writing, and Dall-E 3 is to image generation. You type what you want to see, and it appears, in full motion, in front of your eyes.
None of the videos that have been shown as of yet have any sound, but given advances in AI sound and music generation, we can only assume that this will be coming soon.
Generative AI video creators aren’t entirely new. I’ve outlined a number of them that have appeared in the last year or so in the piece I linked to at the start of this article. Mostly, though, while they generate text, overlays and effects, they don’t produce actual video animation. However, there are a few exceptions, like Runway.
At this early stage, impressive though it is, it isn’t going to give us the next Toy Story from a prompt. But the potential is virtually unlimited. Filmmakers can use it to visualize concepts and scenes or generate special effects. Teachers can create immersive historical recreations, and manufacturers can use it to create prototypes and demonstrations.
At the moment, Sora can generate videos up to one minute long. And it’s more than simple image generation (if we have to think of that as simple now) creating a set of consecutive images to give the impression of movement; it’s capable of tracking the positioning of objects so they move realistically and coherently with other objects, moving in front or behind of them, for example.
It can even perform complicated operations like “remembering” objects when they move off-camera so they will be recreated accurately when they move back into view.
It isn’t perfect, of course, and OpenAI admits that it will generate inconsistencies, such as objects that don’t follow the laws of physics or causality.
But from what we’ve seen, it’s an amazing technology that gives a tantalizing glimpse of what we will soon be able to do!
How Does It Work?
Like Dall-E and other image generators, Sora is essentially a diffusion model, meaning it creates images from random “noise” and gradually de-randomizes them by transforming them into an image that matches their prompt.
Over thousands or tens of thousands of steps, the images that make up the video become more defined.
What really makes it special is the ability to understand how the objects – people or anything else – in the setting would realistically interact with everything else. This could mean water making things wet when they move through it or a ball falling and moving across the floor in a realistic way when it’s dropped.
Just as ChatGPT understands words from their context, learning how they fit together with other words to communicate meaning, Sora understands how things act and behave in real-world settings. OpenAI hasn’t given details of what data it’s trained on, but it’s likely to be many, many hours of real-world video footage from which it can learn how items, people, animals, and scenery move and interact.
As well as generating entirely new footage, it can continue an existing video and recreate existing footage from new angles.
Is The World Ready For Generative Video On-Demand?
Sora offers amazing possibilities. But empowering anyone to create realistic videos of anything they want will clearly not be without dangers.
Scams and phishing attacks could become more sophisticated, for example, by using deepfake videos to make fraudulent activities seem more legitimate or plausible. We’ve already seen this with AI voiceovers overlaid on footage of celebrities to create the impression they are giving their endorsement.
It will inevitably also become easier to create non-consensual videos with convincing likenesses of real people, which could be used to cause harm or for blackmail.
I am sure that we will also see it used in attempts to subvert democratic processes and spread fake news and disinformation, with the aim of undermining trust in politicians, governments, or institutions.
OpenAI tells us it has built safeguards into its algorithms in order to prevent many of these uses and is also developing its own tools to help identify harmful content. But as we’ve seen with ChatGPT, it’s highly likely that workarounds for these will be found, or copycat products will emerge without safeguards in place.
Addressing these issues will require a concerted effort involving education, legislation and the adoption of robust frameworks around responsible, ethical AI use. Sadly, as has been the case with every transformative technology from mechanization to the automobile and computing, it seems inevitable that some harm will be caused.
But the genie is now very much out of the bottle, meaning it’s down to responsible AI users and advocates to ensure society manages these risks effectively while also allowing its transformative potential to be realized.
Related Articles
The Simple ChatGPT Trick That Will Transform Your Business AI Interactions
I believe ChatGPT and other generative AI tools can help pretty much any business.[...]
The Third Wave Of AI Is Here: Why Agentic AI Will Transform The Way We Work
The chess pieces of artificial intelligence are being dramatically rearranged. While previous iterations of AI focused on making predictions or generating content, we're now witnessing the emergence of something far more sophisticated: AI agents that can independently perform complex tasks and make decisions.[...]
How Generative AI Will Change Jobs In Cybersecurity
Ensuring robust cybersecurity measures are in place is more important than ever when it comes to protecting organizations and even governments and nations from digital threats.[...]
The 10 Most Important Banking And Financial Technology Trends That Will Shape 2025
As technological disruption and economic uncertainty continue to reshape the financial landscape, alongside dramatic shifts in consumer behavior and regulatory requirements, 2025 promises to be both challenging and opportunistic for banking and financial services.[...]
The 6 Most Powerful AI Marketing Trends That Will Transform Your Business In 2025
The quiet hum of AI servers is rapidly drowning out the traditional drumbeat of marketing departments worldwide.[...]
AI Everywhere – Scaling AI In The Cloud With Intel® Xeon®6
Today, the omnipresent AI that we’re starting to take for granted has become a critical tool for business.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media