OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data
English summary
OmniDirector introduces a unified framework for camera motion cloning in video generation that uses grid motion videos to visually encode camera parameters, supporting diverse trajectories for multi-shot scenes. It trains on a large dataset of camera grid-video pairs, eliminating the need for cross-paired data. The framework integrates characters, actions, and cameras via multimodal diffusion transformers, providing director-level control. A hierarchical prompt expansion agent harmonizes different control signals to enhance camera motion and visual content descriptions. Extensive experiments demonstrate its superior performance and controllability over existing methods.
Chinese summary
OmniDirector 提出了一种统一的相机运动克隆框架,利用网格运动视频直观编码相机参数,支持多镜头场景下的多样化轨迹。该方法在大规模相机网格-视频对数据集上训练,无需交叉配对数据。框架通过多模态扩散变换器整合角色、动作和相机,实现导演级控制;并采用分层提示扩展代理协调不同控制信号,增强相机运动和视觉内容描述。大量实验表明其性能和控制能力优于现有方法。
Key points
Uses grid motion videos to visually encode camera parameters, enabling arbitrary trajectories without cross-paired data.
使用网格运动视频直观编码相机参数,无需交叉配对数据即可支持任意轨迹。
Trains on a large-scale camera grid-video dataset, avoiding the need for traditional cross-paired video supervision.
在大规模相机网格-视频对数据集上训练,避免了传统交叉配对视频监督的需求。
Integrates multimodal diffusion transformers for joint control over characters, actions, and cameras at director level.
集成了多模态扩散变换器,实现对角色、动作和相机的导演级联合控制。
Employs a hierarchical prompt expansion agent to fuse different control signals, improving camera motion and visual descriptions.
采用分层提示扩展代理融合不同控制信号,提升相机运动质量与视觉内容描述。
Demonstrates superior controllability and performance in extensive video generation experiments.
在大量视频生成实验中展现了卓越的可控性和性能。