ByteDance Releases Lance, a 3B-Parameter Unified Model for Image and Video Generation and Editing

Lance is a new open-source model from ByteDance's Intelligent Creation Lab that combines image understanding, image generation, video understanding, video generation, and editing into a single architecture using only 3 billion activated parameters. The goal is to replace the common practice of stitching together task-specific models with a single system that shares representations across modalities.
The unified design has practical implications beyond parameter efficiency. When a model is trained jointly on understanding and generation tasks across both image and video, it can draw on visual comprehension when generating - for example, applying knowledge of what a scene contains when editing only part of it. Separate models for each task lack that shared context and often produce edits that are inconsistent with the rest of the frame.
At 3B activated parameters, Lance sits in a range that makes it feasible to run on research hardware or reasonably sized cloud instances, which matters for an open-source release. ByteDance has made both code and weights available, allowing external researchers and developers to fine-tune or build on the model without going through an API.
The release arrives as several labs are pursuing similar unified architectures. The value of any single unified model ultimately depends on whether the joint training actually improves task performance rather than just reducing model count, and independent benchmarking of Lance's generation quality relative to specialised models will be the real test. ByteDance has not yet published detailed benchmark comparisons against task-specific alternatives.

