AI Innovations: How DeepSeek Takes on Big Tech

DeepSeek
DeepSeek

ICYMI: What is DeepSeek?

DeepSeek, a Chinese artificial intelligence (AI) startup founded in July 2023, is backed by High-Flyer, a quantitative hedge fund. Both entities were established by Liang Wenfeng. Headquartered in Hangzhou, a major tech hub in China, DeepSeek has quickly gained recognition for its innovative, cost-efficient AI models and open-source approach. Recent releases of their models have made significant impact on the AI industry, even causing stock market turmoil. 

mini5

Leader replicas or leading contributors to AI research?

As the US started imposing AI chip export limitations back in 2022, DeepSeek has focused its research efforts on improving architecture and algorithms to reduce the computing power required to train and run models. They have contributed significantly to AI research and development through their innovative techniques and publications, including:

  • Reinforcement Learning: DeepSeek employs pure reinforcement learning, allowing models to self-improve through trial and error, which has been particularly effective in enhancing reasoning capabilities.
  • Mixture-of-Experts (MoE) Architecture: This approach activates only a fraction of the model's parameters for each task, significantly reducing computational costs and improving efficiency.
  • Multi-Head Latent Attention (MLA): A novel mechanism that enhances the model's ability to process complex data by focusing on multiple aspects of input simultaneously.
  • Distillation Techniques: DeepSeek creates smaller, more efficient models by transferring knowledge from larger models, making advanced AI accessible to a broader audience.

What models has DeepSeek published? 

DeepSeek has been publishing Large Language Models (LLMs) since 2023, with DeepSeek Coder, DeepSeek LLM, DeepSeek Math, DeepSeek V2 series, up to the recent DeepSeek V3 and DeepSeek R1 as the main subjects of excitement. Most of them have been made open source under MIT or Apache 2.0 license. DeepSeek models are available via their application or API (hosted from China), but what's more important is that they can also be installed in their own infrastructure. More platforms that can serve these models or their incarnations are expected to emerge soon. 


Technical details of these models can be found easily on the internet, but for now it is important to remember that:

  • DeepSeek V3 is a high-performance chat model that matches the quality of GPT 4o while requiring significantly less resources to both train and run it.
  • DeepSeek R1 is a reasoning model, and compared to GPT o1, it brings new techniques on how we can train reasoning without human intervention at a much lower cost. It was open sourced by DeepSeek.

mini3
How will DeepSeek impact the AI industry?

The prevailing narrative from leading AI companies like OpenAI and Google has centered on the idea that scaling compute is the key to advancing AI models. This has effectively positioned massive computational resources — requiring significant investment in hardware like NVIDIA processors and substantial electricity consumption — as a prerequisite for anyone looking to train their own cutting-edge models. Meanwhile, DeepSeek has made strides in developing more efficient algorithms, significantly reducing the computational resources needed for AI model training. 

mini4As AI has become a scene for strategic competition between the US and China, we need to remain objective when interpreting some of the news, facts, and motives that are circulating. But there are three things that we want to highlight:

  • DeepSeek claims that the training cost of the V3 model is below US$6 million, around 20 times cheaper than competitive models (US$100 million for GPT 4). This information is being challenged by some parties. But whatever the real number is, training and usage of these models is significantly cheaper. It is not a linear progress but a big jump ahead, reducing the cost by at least a few times.
  • This is even more true for reasoning models like GPT o1. OpenAI asks for a US$200 monthly fee per user to access it and keeps the model closed as a secret. The DeepSeek R1 reasoning model has been made open source and is available to everyone for reuse and for studying its architecture.
  • DeepSeek R1, in essence, is a training and reasoning enhancement algorithm that can be applied on top of any other LLM to train it using Reinforcement Learning without human intervention. This creates a new ability to train the model towards targeted reasoning problems automatically within acceptable costs, and run on the infrastructure affordable for most firms, achieving results comparable to big and heavy models. 

The models will be smaller, cheaper, and will be reasonably better. US firms will not only have to revise their strategy and pricing. As the new know-how is open sourced, we can expect that they will adopt the techniques published by DeepSeek to improve their own models. 

mini2mini1


What potential benefits can companies get from this?

Cheaper, smaller and better reasoning models mean that companies can:

  • Consider new use-cases as the ROI of solutions will improve.
  • Explore more advanced scenarios where targeted reasoning is required, especially since training and fine-tuning their own models becomes affordable.
  • Be more open to consider putting models in their own infrastructure, as well as adopt AI in edge devices.

This shift towards more efficient AI models opens exciting new possibilities for enterprises looking to harness them. Fortunately, tech companies are already on the move to actively prepare for these advancements by testing, benchmarking, exploring custom model training, and developing thought leadership to further showcase the transformative potential of this technology.

Back
to Top