This is the SkyReels V1 model: an open source human-centric AI video model that produces videos comparable to Kling and Hailuo. It is all about HunyuanVideo on O(10M) high-quality film to offer advanced facial animation, with 33 distinct facial expressions with over 400 natural movement combinations.
As explained on the project’s website, multi-stage image-to-video pretraining, inspired by HunyuanVideo design, was used for this model:
- Stage 1: Model Domain Transfer Pretraining: We use a large dataset (O(10M) of film and television content) to adapt the text-to-video model to the human-centric video domain.
- Stage 2: Image-to-Video Model Pretraining: We convert the text-to-video model from Stage 1 into an image-to-video model by adjusting the conv-in parameters. This new model is then pretrained on the same dataset used in Stage 1.
- Stage 3: High-Quality Fine-Tuning: We fine-tune the image-to-video model on a high-quality subset of the original dataset, ensuring superior performance and quality.
SkyReels V1 scored 82.43 on VBench, which compares open source models. It is higher than VideoCrafter 2.0 VEnhancer, and CogVideoX1.5-5b.
[HT]