Yesterday, we saw the beginning of the new era of AI. OpenAI announced Sora, its new text-to-video AI model that turns simple prompts into strikingly realistic video.
According to Open AI, Sora is built on a model with deep understanding of language, which allows it to create moving images that adhere to the physics of reality. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” OpenAI wrote in a blog post.
Given a prompt that outlines the characters, location, emotion, and even filming style you’re looking for, the model can generate a video that’s up to a minute long and contains multiple characters and shots.
[Image: OpenAI]
“Sora is capable of generating complex scenes with multiple characters, specific types of movement, and precise details of the subject and background,” says OpenAI.
The company is not exaggerating.
Unlike previous commercial generative-AI engines, which simply imitated patterns, OpenAI claims that Sora understands reality. That an AI model can deduce how “things exist in the real world” is a striking and monumental moment for AI generation. It’s what enables Sora to produce near-perfect videos. But it’s also bringing us one step closer to the end of reality itself—an era of post-truth where absolutely nothing we see on our phones and computers will be believable.
Putting Sora to the test
Yesterday, OpenAI’s CEO Sam Altman had reasons to be ecstatic, boasting on X about Sora’s extraordinary abilities, and inviting people to suggest prompts for his new favorite AI beast before publishing the results a little later. Like these two golden retrievers podcasting on a mountaintop:
https://t.co/uCuhUPv51N pic.twitter.com/nej4TIwgaP — Sam Altman (@sama) February 15, 2024
It’s impressive. Even while the resolution feels low, the image seems stable, feels realistic enough. The curated examples in the OpenAI page (as shown in the compilation video under this paragraph) are way more impressive. The definition is simply awesome, surpassing anything produced by the previous king and queen of the generative AI video world, Runway and Stability.
Take the example of a woman walking through a city at the 7:19 mark in OpenAI’s video. Her demeanor, the sunglasses, the people in the background, the neon signs, the water reflections . . . your brain buys it completely.
It’s only when you pay close attention—or when you see the non-curated stuff that Sora doesn’t quite get right, as in the video below—that you can appreciate that we are not there quite yet. Sora’s seams are still visible.
even the sora mistakes are mesmerizing pic.twitter.com/OvPSbaa0L9 — Charlie Holtz (@charliebholtz) February 15, 2024
Sora brings us closer to breaking free of the current generative-AI aesthetic that is already so tired, but it doesn’t quite get us near the Goldilocks zone in which there is no AI aesthetic at all. While it is not a perfect generative model, it is undeniably a great leap. What feels clear to me is that we are about to take the last step into the abyss, where visual reality is a blurry concept, at best. Sora has brought us to this jump point. In a few months, you can expect other synthetic-reality engines to up the ante until one outputs images and video totally indistinguishable from the “real” reality that we can see with our own eyes. It’s inevitable.