DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning

Large Language Models (LLMs) have made significant progress in natural language processing, excelling in tasks like understanding, generation, and reasoning. However, challenges remain. Achieving robust reasoning often requires extensive supervised fine-tuning, which limits scalability and generalization. Furthermore, issues like poor readability and balancing computational efficiency with reasoning complexity persist, prompting researchers to explore new approaches.

DeepSeek-R1: A New Approach to LLM Reasoning

DeepSeek-AI’s recent work introduces DeepSeek-R1, a model designed to enhance reasoning capabilities through reinforcement learning (RL). This effort resulted in two models:

DeepSeek-R1-Zero, which is trained solely with RL and demonstrates emergent reasoning behaviors such as long Chain-of-Thought (CoT) reasoning.
DeepSeek-R1, which builds on its predecessor by incorporating a multi-stage training pipeline, addressing challenges like readability and language mixing while maintaining high reasoning performance.

These models aim to overcome existing limitations, combining innovative RL techniques with structured training processes to achieve scalability and usability.

Technical Innovations and Benefits

1. Reinforcement Learning on Reasoning Tasks: DeepSeek-R1-Zero employs RL without relying on supervised data. Using Group Relative Policy Optimization (GRPO), it optimizes reasoning by evaluating multiple outputs, significantly improving benchmark performance. For example, its AIME 2024 pass@1 score rose from 15.6% to 71.0% during training.

2. Multi-Stage Training in DeepSeek-R1: DeepSeek-R1 incorporates cold-start data—thousands of curated CoT examples—to fine-tune its base model before undergoing reasoning-focused RL. This process ensures outputs are both coherent and user-friendly by incorporating language consistency rewards.

3. Distillation for Smaller Models: To address computational constraints, DeepSeek-AI distilled six smaller models (1.5B to 70B parameters) from DeepSeek-R1 using Qwen and Llama architectures. These models retain strong reasoning capabilities, with the 14B distilled model achieving a pass@1 score of 69.7% on AIME 2024, outperforming some larger models.

Results: Performance Insights

DeepSeek-R1’s performance is supported by benchmark results:

Reasoning Benchmarks:
- AIME 2024: 79.8% pass@1, surpassing OpenAI’s o1-mini.
- MATH-500: 97.3% pass@1, comparable to OpenAI-o1-1217.
- GPQA Diamond: 71.5% pass@1, excelling in fact-based reasoning.
Coding and STEM Tasks:
- Codeforces Elo rating: 2029, outperforming 96.3% of human participants.
- SWE-Bench Verified: 49.2% resolution rate, competitive with other leading models.
General Capabilities:
- Strong generalization was demonstrated on ArenaHard and AlpacaEval 2.0 benchmarks, achieving 92.3% and 87.6% win rates, respectively.

Distilled Model Highlights: Smaller models like DeepSeek-R1-Distill-Qwen-32B show strong performance, with a pass@1 score of 72.6% on AIME 2024, demonstrating effective scalability and practicality.

Conclusion: Refining Reasoning in AI

DeepSeek-AI’s DeepSeek-R1 and DeepSeek-R1-Zero represent meaningful advancements in reasoning capabilities for LLMs. By leveraging RL, cold-start data, and distillation techniques, these models address critical limitations while promoting accessibility through open-source availability under the MIT License. The API (‘model=deepseek-reasoner’) further enhances usability for developers and researchers.

Looking ahead, DeepSeek-AI plans to refine multilingual support, enhance software engineering capabilities, and improve prompt sensitivity. These efforts aim to further establish DeepSeek-R1 as a robust solution for reasoning-focused AI applications. By integrating thoughtful training paradigms, DeepSeek-R1 illustrates how AI can advance toward addressing increasingly complex challenges.

Check out the Paper, DeepSeek R1 and DeepSeek R1 Zero. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

The post DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models that Incentivize Reasoning Capability in LLMs via Reinforcement Learning appeared first on MarkTechPost.

AI NewsWire .org

Archives

Categories