May 21, 2026Video

Google Launches Gemini Omni for Video Generation and Conversational Editing

Google used its I/O 2026 conference to introduce Gemini Omni Flash, a multimodal model aimed squarely at video creation and editing. The model takes any combination of text, images, audio, and existing video clips as input, then generates new video or modifies existing footage through a conversational interface - users describe what they want changed, and the model responds accordingly, iterating without requiring new prompts from scratch each time.

The name reflects a broader ambition: Google describes Omni as a framework for generating "anything from any input," with video as the initial focus. The company says the model has stronger grounding in real-world knowledge - history, physics, cultural context - compared to prior video models, which it expects to translate into more plausible motion and scene dynamics. Whether that holds up under broader use remains to be seen, but the claim speaks to a known weakness in current video generation tools.

On the distribution side, Gemini Omni Flash is launching first to Google AI Ultra subscribers globally, with a consumer rollout expected to follow. All output automatically carries a SynthID watermark, tying it into Google's expanding provenance infrastructure. The model also supports avatar-based video creation, letting users synthesize a digital likeness rather than appearing on camera directly.

Google's timing is deliberate - Omni arrives alongside several other I/O announcements including the Google Pics image editor and Genie 3 world model expansion, signaling a coordinated push across the generative media stack rather than a single standalone release.

Read at TechCrunch →

Share:X

Your next read

June 4, 2026Video

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

xAI has updated its Grok Imagine system to version 1.5, adding an image-to-video model that converts still images into short video clips at up to 720p resolution. The new model accepts text prompts to guide motion and style, and multiple generated clips can be joined into longer sequences.

June 3, 2026Video

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA has released Cosmos 3, an open omnimodal foundation model that combines a vision-language reasoning component with a diffusion-based video generator in a two-tower architecture. The system is designed to support physical AI applications by linking language-grounded reasoning with the generation of plausible world states and robot actions.

June 1, 2026Video

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to unveil several new tools aimed at physical AI applications, including a new world model, a larger autonomous driving model, and an open reference platform for humanoid robots. The announcements signal a continued push to make simulation and synthetic data central to how robots and vehicles are trained. Here is a closer look at what was shown and why it matters.

Enjoy this story? Get the next one in your inbox.

Your next read

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot