OpenAI’s ‘Sora’ Can Create a Mind-Blowing Movie Trailer with Just Words

By Shubhendu Vatsa - News

Published: 20 Feb 2024, Last Updated: 29 May 2024

Imagine creating a movie trailer with nothing but words. That’s what Sora, OpenAI’s revolutionary new text-to-video model can do. Named after the Japanese word for “sky,” Sora can generate videos of up to one minute in length with “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.”

While OpenAI’s text-to-video model isn’t the first publicly available model of its kind, it is currently the only one capable of generating videos up to a minute long. The company claims that the model does this “while maintaining visual quality and adherence to the user’s prompt.” Additionally, OpenAI notes that Sora can understand how objects “exist in the physical world, accurately interpret props and generate compelling characters that express vibrant emotions.”

Beyond creating videos based on prompts, OpenAI’s Sora can even generate videos from still images and, even more impressively, fill in missing frames or extend existing videos. While the model hasn’t been publicly released yet, the company released several Sora-generated clips, including a woman strolling down a Tokyo street illuminated by glowing neon and animated signage, and an aerial view of California during the gold rush. CEO Sam Altman also went on to tease the AI model’s capabilities by asking for prompt ideas from users on X. Based on these prompts, he shared a series of videos, showcasing the model’s diverse capabilities, which definitely look impressive.

While OpenAI’s Sora impresses with its ability to conjure videos from thin air, it’s not quite there, at least for now. Glitches like a woman’s leg switching sides in one video or the floor moving in a questionable way in another hint at the limitations of the model. OpenAI themselves admit Sora is a work in progress, acknowledging it “may struggle with accurately simulating the physics of a complex scene.”

As the official blog post reads, “We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.” The company hasn’t revealed how much video footage was used to train Sora or where the data came from but it told The New York Times that the videos used were both publicly available and licensed from copyright owners.

As of writing, OpenAI’s Sora is not yet publicly available. The company has granted access to a select group of security researchers or “red teamers” to assess potential risks and harms. Additionally, dozens of visual artists, designers, and filmmakers have also been granted access “to gain feedback on how to advance the model to be most helpful for creative professionals.”

While it’s still early days for Sora and similar text-to-video generators, it offers a fascinating glimpse into the future of AI-powered storytelling and its potential impact on the creative industries.