Your Position Home AI Technology

Has the “curse” of the big model been broken by DeepSeek?

Wen| There is always a reason

In the new year’s global technology circle, the protagonist is DeepSeek. Since its release, DeepSeek has triggered a series of chain reactions across the entire AI industry chain. Whether it is OpenAI or Nvidia, its obvious shock seems to confirm that DeepSeek has been successful.

DeepSeek’s initial performance is indeed remarkable. Data shows that DeepSeek’s daily active users have exceeded ChatGPT in the five days after its launch, and its daily active users in the 20 days after its launch have reached more than 20 million, which is 23% of ChatGPT. Currently, DeepSeek has become the fastest growing AI application in the world.

While overseas AI players are unbelievable, there is a lot of noise in the domestic AI field: As of now, Alibaba Cloud, Baidu Cloud, Tencent Cloud, and Byte Volcano Engine have officially supported DeepSeek; at the same time, Baidu Kunlunxin, Tiantian Zhixin, and Moore Thread have successively announced support for the DeepSeek model.

This also marks that domestic manufacturers have finally taken another step forward in the global AI race. Whether the emergence of DeepSeek has broken some traditional curses for the long-standing large-scale model industry, many crucial details are actually worth further investigation.

Is DeepSeek’s exit from the circle accidental?

Looking at the current major controversy surrounding DeepSeek, it seems that every point points to the same question: whether DeepSeek has really achieved a technological breakthrough in the big model. As early as when DeepSeek announced that its model training cost was only 1/10 of the industry, some people questioned that DeepSeek was achieving it by significantly reducing the size of model parameters or relying on the cheap computing power accumulated in the early days of the parent company’s Magic Square.

From a certain perspective, these doubts have traces to follow.

On the one hand, DeepSeek’s radicalization in reducing the size of model parameters is obvious to all. On the other hand, the magic square behind DeepSeek does have a certain amount of computing power storage. It is reported that Magic Square is the only company other than BAT that can store 10,000 A100 chips. It has been reported that in 2023, it has been announced that no more than 5 domestic companies have accumulated more than 10,000 GPUs.

Magic squares are one of them.

However, it is worth mentioning that neither the reduction of the scale of model parameters nor the controversy over computing power innovation cannot deny the substantive significance of DeepSeek’s small power’s miracle play. First, DeepSeek-R1 surpassed large models such as GPT-4 with a success rate of 79.8% in mathematical benchmarks with a parameter size of only 150 million (1.5B).

Secondly, lightweight models naturally perform better in terms of reasoning capabilities and performance, and have lower training and operation costs. It is reported that DeepSeek provides performance similar to GPT-4 for only 1/50 of the price, grabbing a certain market position among small and medium-sized businesses and individual developers.

As for the addition of magic squares to DeepSeek, it is not so much an accidental game of capital, but rather the inevitable result of the growth of domestic large models. It is worth noting that Magic Square Quantification is one of the first companies in China to break into large model tracks. As early as 2017, Magic Square announced that it would achieve comprehensive AI investment strategies.

In 2019, Magic Square quantified and established an AI company. Its self-developed deep learning training platform Firefly-1 has a total investment of nearly 200 million yuan and is equipped with 1100 GPUs; two years later, the investment of Firefly-2 has increased to 1 billion yuan, equipped with About 10,000 NVIDIA A100 graphics cards.

In November 2023, DeepSeek’s first open source model, DeepSeek-Coder, was released. In other words, DeepSeek, which caused overseas technology giants to collectively break the defense, was not a product overnight, but a step that domestic AI manufacturers have to take sooner or later in the layout of large models.

Has the “curse” of the big model been broken by DeepSeek?插图

It is undeniable that there are currently objective conditions in China to cultivate DeepSeek. Public information shows that a comprehensive artificial intelligence system is being born under the pursuit of capital from all parties. There are more than 4500 domestic artificial intelligence-related companies, and the core industry scale is close to 600 billion yuan.

Chips, algorithms, data, platforms, and applications The penetration rate of artificial intelligence represented by large models in my country has reached 16.4%.

Of course, the risk of DeepSeek’s technical path dependence always exists, which makes DeepSeek’s exit from the circle a little more accidental, especially the data distillation technology that continues to be questioned. In fact, DeepSeek is not the first large model to use data distillation. Excessive distillation is even a major contradiction in current artificial intelligence tracks.

Many institutions from the Chinese Academy of Sciences and Peking University have pointed out that except for bean buns, Claude, and Gemini, most open/closed source LLMs have too high levels of distillation. Excessive reliance on distillation may lead to stagnation of basic research and reduce diversity among models. Some professors at Shanghai Jiao Tong University also said that distillation technology cannot solve the fundamental challenges in mathematical reasoning.

All in all, these are forcing DeepSeek K and even the entire domestic model track to continue self-verification. Perhaps a second DeepSeek will be born in China. From a realistic perspective, DeepSeek’s success is inevitable far greater than accidental.

“Is the era of open source coming?

It is worth noting that compared with the technology debate, DeepSeek has once again triggered fierce arguments on open source and closed source in the global technology circle. Yang Likun, chief scientist of Meta, also said on social platforms that it is not that China is catching up with the United States, but that open source is catching up with closed source.

When talking about open source models, we also need to trace back to a source code leak in Meta in 2023. At that time, Meta followed the trend and released the open source and commercial version of LLama 2, which immediately set off an open source craze on the large model track. Domestic companies such as Wudao, Baichuan Intelligent, and Alibaba Cloud entered the field of open source large models.

According to Kimi chat statistics, there will be more than 10 open source model brands in 2024. 2025 is less than two months away. In addition to the popular DeepSeeK, there are countless open source participants.

It is reported that on January 15, MiniMax open-source two models. One is the basic language model MiniMax-Text- 01, and the other is the visual multimodal model MiniMax-VL- 01; at the same time, NVIDIA has also open source its own world model, with three models: NVIDIA Cosmos Nano, Super and Ultra; On January 16, Alibaba Cloud Tongyi also open-source a mathematical reasoning process reward model with a size of 7B.

From 2023 to 2025, after countless AI talents debated endlessly, is the era of open source for big models finally coming?

One thing that is certain is that compared to the closed-source model, open source models can gain a lot of attention in a short period of time due to their openness. Public information shows that when LLama 2 was released, its Hugging Face search model had more than 6000 results. Baichuan Intelligent showed that its two open source models had more than 5 million downloads in September of that year.

In fact, DeepSeek’s rapid popularity is inseparable from its open source model. Statistics from February show that there are countless companies currently connected to the DeepSeek series of models, and cloud manufacturers, chip manufacturers, and application-side companies have all come to join in the fun. At a time when AI demand is booming, open source for large models seems to be more conducive to the ecology of AI.

However, whether the large model track is open source is actually open to question.

Although Mistral AI and xAI are both supporters of open source, their flagship models are currently closed. Most domestic manufacturers basically close source with one hand and open source with the other. Typical examples are Alibaba Cloud, Baichuan Intelligent, and even Robin Li was once a loyal fan of the closed-source model.

The reason is not difficult to guess.

On the one hand, open source AI companies are not welcomed by capital in the global science and technology field. On the contrary, closed-source AI companies have more advantages in financing. Statistics show that since 2020, global startups in the closed-source AI field have completed US$37.5 billion in financing, while open-source AI companies have only received US$14.9 billion in financing.

For AI companies that spend money like flowing water, the gap is not even a small one.

On the other hand, the definition of open source AI has become increasingly complex in the past two years. In October 2024, the Global Association for the Promotion of Open Source released version 1.0 of the Open Source AI Definition. The new definition shows that there are three key points for the AI model to be regarded as open source: first, training data transparency; second, complete code; Third, model parameters.

Based on this definition, DeepSeek has been questioned as not truly open source, but just to cater to short-term momentum. Globally, a report in Nature also pointed out that many technology giants claim that their AI models are open source, but in fact they are not completely transparent.

A few days ago, Altman, who was hit, admitted for the first time that OpenAI’s shutdown of sources was a mistake. Perhaps, with the popularity of DeepSeek, a drama of saliva in the AI world will begin again.

Large-scale computing power investment is about to be suspended?

During this period of time, many AI companies that are addicted to hoarding computing power have been ridiculed for the emergence of DeepSeek, and computing power suppliers such as Nvidia have also suffered a huge setback in their stock prices. Frankly speaking, DeepSeeK has indeed brought new breakthroughs in some aspects, especially in the monopoly curse, which has alleviated some anxiety.

However, the computing power needs of large global model tracks cannot be ignored, and even DeepSeeK itself may not be able to suspend computing power investment.

It should be noted that DeepSeek currently only supports functions such as text question and answer, picture reading, and document reading, and has not yet covered the fields of image, audio and video generation. Even so, its server is still stuck on the verge of collapse, and once it wants to change the form, the computing power demand will explode. There is a huge gap in computing power demand between the video generation model and the language model.

Public data shows that the computing power requirements for OpenAI’s Sora video generation large model training and reasoning are 4.5 times and nearly 400 times that of GPT-4 respectively. The span from language to video is still so large that with the birth of various super computing power scenarios, the need for computing power construction has only increased.

Data shows that between 2010 and 2023, the demand for AI computing power has increased by hundreds of thousands of times, far exceeding the growth rate of Moore’s Law. Entering 2025, OpenAI has released its first AI Agent product Operator, which has a tendency to detonate super computing power scenarios. This is the key to whether the construction of computing power continues.

It is reported that the current definition of the development of the large model is divided into five development stages: L1 language ability, L2 logic ability, L3 ability to use tools, L4 self-learning ability, and L5 exploration of scientific laws. The Agent is located in L3 ‘s tool use capabilities, and is also exploring L4 ‘s self-learning capabilities.

According to Gartner forecasts, by 2028, 15% of daily work decisions around the world are expected to be made through Agenda AI. If the large model track runs all the way as planned, from L1 to L5, major AI companies around the world will not ignore the construction of computing power.

By the L3 stage, what will the computing power requirements be?

In a report in October 2024, Barclays Bank predicted that by 2026, if consumer artificial intelligence applications can exceed 1 billion daily active users and Agents have a penetration rate of more than 5% in enterprise business, it will require at least 142B ExaFLOPs (approximately 150,000,000,000,000,000,000,000 P) of AI computing power to generate 5,000 trillion tokens.

Even though the arrival of the super application phase is still far away, no company is willing to fall one step behind in the current fierce battlefield of accelerating the elimination of large model tracks. AI giants at home and abroad such as Microsoft, Google, Amazon, Meta, ByteDance, Ali, Tencent, and Baidu will probably continue to spend money to gamble on the future.

In addition, DeepSeek is most praised for bypassing the chip barrier.

However, as the cornerstone of the computing power industry, high-quality computing power infrastructure will often provide higher computing power efficiency and business returns with the same investment. As mentioned in “Top Ten Trends in the Computing Power Industry in 2025”, taking GPT-4 as an example, its performance will differ significantly under different hardware configurations. Comparing the performance of GPT-4 driven by different hardware configurations such as H100 and GB200, the profitability of the GB200 Scale-Up 64 configuration is 6 times that of the H100 Scale-Up 8 configuration.

Has the “curse” of the big model been broken by DeepSeek?插图1

DeepSeek’s questioning of Sanbang’s servers may imply that the core chasing game on the large model track has not yet ended in the computing power competition. It is reported that in 2025, NVIDIA’s next-generation GPU GB300 may experience multiple key hardware specification changes, and the domestic AI chip localization process is also progressing day and night.

Various signs show that the hard construction of computing power cannot be stopped for a while, but it has become even more troublesome.

[Introduction by the author of the Titu number: Tao is always right, he used the name to be crooked, and new media in the Internet and technology circle. This article is an original article, and any form of reprinting that does not retain relevant information about the author is refused.]

Popular Articles