Sora

OpenAI introduces Sora, its new text-to-video AI model. Sora can create videos of up to a minute of realistic and imaginative scenes given text instructions.

OpenAI reports that its vision is to build AI systems that understand and simulate the physical world in motion and train models to solve problems requiring real-world interaction.

Capabilities

Sora can generate videos that maintain high visual quality and adherence to a user's prompt. Sora also has the ability to generate complex scenes with multiple characters, different motion types, and backgrounds, and understand how they relate to each other. Other capabilities include creating multiple shots within a single video with persistence across characters and visual style. Below are a few examples of videos generated by Sora.

Prompt:

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

Prompt:

A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

Video source: https://openai.com/sora (opens in a new tab)

Methods

Sora is reported to be a diffusion model that can generate entire videos or extend generated videos. It also uses a Transformer architecture leading to scaling performance. Videos and images are represented as patches, similar to tokens in GPT, leading to a unified video generation system that enables higher durations, resolution, and aspect ratios. They use the recaptioning technique used in DALL·E 3 to enable Sora to follow the text instructions more closely. Sora is also able to generate videos from a given image which enables the system to accurately animate the image.

Limitations and Safety

The reported limitations of Sora include simulating physics and lack of cause and effect. Spatial details and events described (e.g., camera trajectory) in the prompts are also sometimes misunderstood by Sora. OpenAI reports that they are making Sora available to red teamers and creators to assess harms and capabilities.

Prompt:

Prompt: Step-printing scene of a person running, cinematic film shot in 35mm.

Video source: https://openai.com/sora (opens in a new tab)

Find more examples of videos generated by the Sora model here: https://openai.com/sora (opens in a new tab)

Phi-2 LLM Collection