DeepSeek developed its AI model at an exceptionally low cost—estimated between $5 and $6 million—by using strategic methods that set it apart from competitors in the AI industry.
Key Factors Behind DeepSeek’s Low Development Costs
-
Optimized Hardware Use:
Instead of relying on the latest technology, DeepSeek used a mix of lower-end chips and a stockpile of Nvidia A100 chips purchased before export restrictions took effect. This approach provided access to powerful hardware without the high costs of cutting-edge technology.
-
Efficient Training Methods:
The company adopted innovative techniques, such as the mixture of experts (MoE) architecture, which activates only a portion of the model’s parameters during inference. This strategy significantly reduced computational demands and expenses. For example, DeepSeek’s model has 671 billion parameters but only uses 37 billion at a time, maximizing efficiency.
-
Lower GPU Hour Costs:
DeepSeek spent about 2.79 million GPU hours to train its models, costing approximately $5.58 million at an estimated rate of $2 per GPU hour. This figure is significantly lower than competitors’ expenses, as they typically rely on more advanced hardware and extensive infrastructures.
-
Open-Source Advantage:
By open-sourcing its model, DeepSeek cut development costs while fostering collaboration and innovation. This approach accelerates advancements without the hefty expenses of proprietary AI models.
-
Strategic Team and Resource Allocation:
DeepSeek quickly assembled a skilled team, leveraging its connections with High-Flyer, a hedge fund specializing in AI-driven strategies. This focus on talent and efficient resource distribution helped the company innovate rapidly while keeping costs low.
Combining these cost-cutting strategies, DeepSeek challenges industry giants like OpenAI and Google. Its success raises questions about the long-term viability of traditional AI development models, which depend on expensive hardware and massive financial investments.