Plenty of us are excited about new AI models but many of them are not open source. DeepSeek V3 is a new open source model with 671B parameters, trained on 14.8T high quality tokens. It features a multi-head latent attention for efficient inference. It uses the DualPipe algorithm for efficient pipeline parallelism.
LiveBench reported by r/LocalLlama – DeepSeek v3 is the BEST open weight LLM AND SECOND BEST non-reasoning LLM after `gemini-exp-1206` π₯ https://t.co/OQMeuIxlBm pic.twitter.com/rbWu9jDB25
β Vaibhav (VB) Srivastav (@reach_vb) December 25, 2024
As the company explains, this model is comparable to GPT-4o in educational and factuality benchmarks. It is trained on a cluster with 2048 NVIDIA H800 GPUs. As far as the API price, you pay $0.27 for a million tokens.
π Whatβs new in V3?
π§ 671B MoE parameters
π 37B activated parameters
π Trained on 14.8T high-quality tokensπ Dive deeper here:
Model π https://t.co/9iwEF6aLuk
Paper π https://t.co/ruzwMFYAAHπ 2/n
β DeepSeek (@deepseek_ai) December 26, 2024
For now, you can access DeepSeek V3 right away here. You can activate DeepThink and perform search to use this.
[HT]