In the past couple of days, there has been a lot of talk about DeepSeek but plenty of others have been releasing exciting AI models. The Qwen2.5-VL Vision which was announced recently has advanced visual understanding. It is also agentic, so it can interact with computers and phones.
๐ ๆญๅๅ่ดข๐งง๐ As we welcome the Chinese New Year, we’re thrilled to announce the launch of Qwen2.5-VL , our latest flagship vision-language model! ๐
๐ Qwen Chat: https://t.co/T0nMBnRVBB
๐ Blog: https://t.co/FU7qEgE46j
๐ค Hugging Face: https://t.co/N9XSslZX8d
๐ค ModelScope:โฆ pic.twitter.com/KgjC2lHcvRโ Qwen (@Alibaba_Qwen) January 27, 2025
This model can handle videos up to 1 hour long. It can also generate bounding boxes and JSON outputs for object detection. It offers structured data outputs. The above video just gives you a taste of what this model is capable of. You can find these models on Hugging Face at this point.
[HT]