Those of you who have used AI video tools in the past know that getting two subjects to interact naturally is not always easy. DreamRelation aims to change that. It uses an example video to replicate it in AI. For example, you can have characters in your video hug or punch each other based on an example video. Here is how it works:
method decomposes relational video customization into two concurrent processes. (1) In Relational Decoupling Learning, Relation LoRAs in relation LoRA triplet capture relational information, while Subject LoRAs focus on subject appearances. This decoupling process is guided by hybrid mask training strategy based on their corresponding masks. (2) In Relational Dynamics Enhancement, the proposed space-time relational contrastive loss pulls relational dynamics features (anchor and positive features) from pairwise differences closer, while pushing them away from appearance features (negative features) of single-frame outputs. During inference, subject LoRAs are excluded to prevent introducing undesired appearances and enhance generalization
[HT]