Note sull'episodio
DeepSeek has taken the AI world by storm, sparking excitement, skepticism, and heated debates. Is this the next big leap in AI reasoning, or is it just another overhyped model? In this episode, we peel back the layers of DeepSeek-R1 and DeepSeek-V3, diving into the technology behind its Mixture of Experts (MoE), Multi-Head Latent Attention (MLA), Multi-Token Prediction (MTP), and Reinforcement Learning (GRPO) approaches. We also take a hard look at the training costs—is it really just $5.6M, or is the actual number closer to $80M-$100M?
Join us as we break down:
- DeepSeek’s novel architecture & how it compares to OpenAI’s models
- Why MoE and MLA matter for AI efficiency
- How DeepSeek trained on 2,048 H800 GPUs in record time
- The real cost of training—did DeepSeek underestimate their n ...
Parole chiave
blogengineering bloggpudeepseekLLM