gen‑ai.news
← Back
Video

Google's Genie 3 World Model Connects to Street View for Location-Grounded AI Environments

Google DeepMind announced at I/O 2026 that Project Genie, its interactive world model, now connects to Google Street View data. The integration lets users select a real-world location on a map and receive an AI-generated, navigable environment derived from that place's visual appearance. The feature is rolling out to AI Ultra subscribers globally.

Genie 3 generates interactive environments rather than passive video - the output is something closer to a simplified game world that can be explored in real time. Connecting it to Street View provides a geographically grounded starting point, which serves both creative and technical purposes. For creative uses, the result is an explorable space that retains the architectural character and visual texture of a real location. For research purposes, it gives AI agents and robotics systems a source of diverse, real-world-anchored environments for training and testing.

Street View's value as a training and grounding resource has been building since Google began collecting it in 2007. The dataset spans a wide range of urban and rural environments across many countries, with varying lighting, weather, and seasonal conditions - the kind of diversity that is difficult to replicate synthetically and that world models benefit from when learning spatial relationships and physical layout.

The project sits at an interesting intersection between generative video, interactive simulation, and robotics research. Google DeepMind has framed it as having applications beyond creative tools, suggesting the Street View-grounded world model could become infrastructure for agent training at scale.

Enjoy this story? Get the next one in your inbox.

Twice a week: the most important stories in generative image and video AI, distilled into a 2-minute read.

Free. Unsubscribe any time. No spam, ever.

Your next read

Video

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA has released Cosmos 3, an open omnimodal foundation model that combines a vision-language reasoning component with a diffusion-based video generator in a two-tower architecture. The system is designed to support physical AI applications by linking language-grounded reasoning with the generation of plausible world states and robot actions.

Video

Nvidia bets big on physical AI at GTC Taipei with a new world model, driving brain, and open humanoid robot

Nvidia used GTC Taipei to unveil several new tools aimed at physical AI applications, including a new world model, a larger autonomous driving model, and an open reference platform for humanoid robots. The announcements signal a continued push to make simulation and synthetic data central to how robots and vehicles are trained. Here is a closer look at what was shown and why it matters.