In the past few years, we have covered plenty of impressive text to video models. Goku happens to be a flow-based generative AI video approach that delivers superior results. It can be used for text to video, image to video, and text to image generation. Here are its results:
- 0.76 on GenEval (text-to-image generation)
- 83.65 on DPG-Bench (text-to-image generation)
- 84.85 on VBench (text-to-video generation)
Some of the videos shared by the team look pretty realistic.
You can find out more here.
[paper]