Many of us use text to video models to generate videos for our YouTube and other social media presences. Just like other AI tools, these models can be exploited. BadVideo explores a stealthy backdoor attack approach that is effective against text to video models. This approach is based on the following:
1) Spatio-Temporal Composition, which combines different spatiotemporal features to encode malicious information; (2) Dynamic Element Transformation, which introduces transformations in redundant elements over time to convey malicious information.
With this approach, it is possible to evade content moderation systems that analyze spatial information within frames. The above GIF shows how it is possible to display foul language within an AI generated video.
[HT]