Here is a diffusion-based condition control method that can bring images to life in form of realistic, controllable animations. It uses uses pre-trained encoders to separate motion information (like expressions and poses) from the person’s identity (appearance) in the videos. With stable video diffusion, it is possible to generate high quality animations.
This approach can be used to make your images sing, act and make faces.
[HT]