Your Position Home AI Technology

Whales Rise and Fall: Summary of 2 hardcore podcasts about DeepSeek

Wen| Lanxi

The ten days when DeepSeek became popular were actually the period when the noise was the most. To be honest, most of the finished discussions seemed to be working overtime to catch up with KPIs. People and ghosts were talking about it, and there were only a handful of them with retention value. There were two podcasts that benefited me a lot after listening to them, and I highly recommend them.

One is that Zhang Xiaojun invited Pan Jiayi, a doctor in the AI Laboratory of the University of California, Berkeley, to explain the DeepSeek paper sentence by sentence. The high-density output for nearly 3 hours is very effective in killing brain cells, but the endorphins secreted after killing are also explosive.

The other is a three-episode podcast collection by Ben Thompson on DeepSeek, which adds up to more than an hour. This guy is the founder of News Letter and one of the most technologically savvy analysts in the world. He lives in Taipei all year round and has close insight into China/Asia. Much higher than his American counterparts.

Let’s first talk about Zhang Xiaojun’s issue. Guest Pan Jiayi developed a small-scale project to reproduce the R1-Zero model as quickly as possible after reading DeepSeek’s paper. It has nearly 10,000 Stars on GitHub.

This kind of knowledge relay passed on from generation to generation is actually a projection of idealism in the technical fieldJust like Flood Sung, a researcher at the Dark Side of the Moon, also said that Kimi’s inference model k1.5 was originally inspired based on two videos released by OpenAI. Earlier, when Google released “Attention Is All You Need”, OpenAI immediately realized that the future of Transformer, and the mobility of intelligence is the prerequisite for all progress.

That is why everyone is very disappointed with Anthropic founder Dario Amodei’s blockade statement that “science has no borders, but scientists have their motherland.” While denying competition, he is also challenging basic common sense.



– OpenAI o1 has done very deep hidden work while making its stunning debut. It does not want other manufacturers to crack the principle. However, from the perspective of the situation, it is a bit like raising a riddle to the industry. The bet is that everyone here will not solve it so quickly.DeepSeek-R1 was the first to find the answer, and the process of finding the answer was quite beautiful;

– Open source can provide more certainty than closed source, which is very helpful to the growth of manpower and the output of results. R1 is equivalent to clearly stating the entire technical route, so its contribution in stimulating scientific research investment should be better than the O1 of Tibetan tactics;

– Although the AI industry is burning money on an increasingly large scale, the fact is that we have not obtained the next generation model for nearly two years, and the mainstream model is still aligned with GPT-4. This is very rare in a market that advocates “changing with each passing day”. Even if we don’t ask whether Scaling Laws hit the wall, OpenAI o1 itself is a new technical line attempt, using language models to let AI learn to think;

– O1 has achieved a linear improvement in intelligence levels in the benchmark test. This is very impressive. The technical report issued did not disclose many details, but the key points were covered, such as the value of reinforcement learning. Pre-training and supervised fine-tuning are equivalent to providing the model with the correct answers to imitate. Over time, the model will learn to follow suit. But reinforcement learning is to let the model complete the task itself. You only tell it whether the result is right or wrong, and if it is right, do it more often. If it’s wrong, don’t do it;

Whales Rise and Fall: Summary of 2 hardcore podcasts about DeepSeek插图


– OpenAI found that reinforcement learning can allow models to produce effects close to human thinking, known as CoT (Thought Chain). When the problem solving steps go wrong, it will go back to the previous step and try to think of new methods. These are not taught by human researchers. Instead, the model itself was forced to complete the task. Oh no, it was an ability that emerged. Later, when DeepSeek-R1 also reproduced a similar “epiphany moment”, the core fortress of o1 was broken by the real hammer;

– The inference model is essentially a product of economic calculation. If you force the computing power, you may still be able to forcibly produce an effect similar to o1 even in GPT-6. However, that is not a miracle caused by hard work, but a miracle caused by miracles. Yes, but it is unnecessary. The model ability can be understood as training computing power x inference computing power. The former is too expensive, and the latter is still very cheap, but the multiplier effect is almost equal. Therefore, now the industry is beginning to develop a more cost-effective reasoning route;

– The release of o3-mini at the end of last month may not have much to do with DeepSeek-R1, but the pricing of o3-mini has dropped to 1/3 of that of o1-mini, which must have been greatly affected. OpenAI internally believes that ChatGPT’s business model has a moat, but it does not have an API. It is too substitutable. Recently, there has been controversy in China about whether ChatBot is a good business. Even DeepSeek obviously doesn’t want to understand how to handle this wave of turbulent traffic. There may be a natural conflict between doing consumer markets and doing cutting-edge research;

– In the opinion of technical experts, DeepSeek-R1-Zero is more beautiful than R1 because the component of manual intervention is lower. It is purely the model itself that has figured out the process of finding the optimal solution in thousands of steps of reasoning, and has reliance on prior knowledge. Not so high, but because there is no alignment processing, R1-Zero is basically unable to be delivered to users for use. For example, it has various languages mixed output, so in fact, DeepSeek’s R1, which is recognized in the mass market, still uses distillation. Fine-tuning or even pre-embedding old methods such as thought chains;

– This involves a problem that ability and performance are not synchronized. The model with the best ability may not be the one with the best performance, and vice versa. R1 performs well largely because the direction of manual effort is in place. R1 does not have exclusive access to the training corpus. Everyone’s corpus will include classical poetry. There is no way R1 knows more. The real reason may be that data annotation.It is said that DeepSeek has hired students from the Chinese Department of Peking University to mark, which will significantly improve the reward function of literary talent expression. Generally speaking, the saying that Liang Wenfeng himself sometimes makes marks does not only shows his enthusiasm, but that the marking project has long reached the point where professional problem makers are needed to coach AI. OpenAI also pays an hourly salary of US$100 -200 to hire doctoral students to mark o1;

– Data, computing power, and algorithms are the three flywheels of the big model industry. The main breakthrough in this wave comes from algorithms. DeepSeek-R1 discovered a misunderstanding, that is, the emphasis on the value function in traditional algorithms may be a trap. The value function tends to judge every step of the reasoning process, thereby guiding the model on the right path. For example, when the model answers what 1+1 equals, when it creates the illusion of 1+1=3, it begins to punish it. It’s a bit like electroshock therapy, where it’s not allowed to make mistakes;

– This algorithm is theoretically correct, but it is also very perfectionist. Not every question is as simple as 1+1. Especially when thousands of Token sequences are frequently inferred in a long thought chain, every step has to be supervised. The input-output ratio will become very low, so DeepSeek made a decision that went against its ancestral teachings. It no longer uses a value function to satisfy the obsessive-compulsive disorder in research, only scores the answers, and let the model solve itself how to use the correct steps to get the answer. Even if it has a problem solving idea of 1+1=3, it will not over-correct it. Instead, it will realize that something is wrong during the reasoning process, find that it cannot get the correct answer if it continues to calculate like this, and then make self-correction;

– Algorithm is DeepSeek’s biggest innovation in the entire industry, including how to distinguish whether a model is imitating or reasoning. I remember that after o1 came out, many people claimed that universal models could also output thought chains through prompts, but those models had no reasoning ability. In fact, it is imitation. It still gives answers in accordance with the conventional model. However, because it has to meet user requirements, it goes back and gives ideas based on answers. This is imitation, a meaningless action of shooting arrows first and then drawing the target. DeepSeek has also made a lot of efforts to counter the rewards of model cracking, mainly to address the problem of models becoming thieves. It gradually guessed how to think would be rewarded, but did not really understand why it should think like this;

– Over the past few years, the industry has been looking forward to the emergence of models and emerging behaviors. In the past, it would feel that if there was enough knowledge, models could naturally evolve wisdom. However, after o1, it was discovered that reasoning seemed to be the most critical springboard. DeepSeek emphasized in the paper what behaviors of R1-Zero emerged spontaneously rather than human-ordered. For example, when it realized that generating more Tokens can it think more perfectly and ultimately improve its performance, it began to proactively The chain of thought becomes longer and longer. In the human world, this is an instinctive—— long-term test, of course, is more strategic than fast chess——, but it is very surprising to let the model learn such an experience on its own;

– The training cost of DeepSeek-R1 may be between US$100,000 and US$1 million, which is less than the US$6 million for V3. In addition, after open source, DeepSeek also demonstrated the results of using R1 to distil other models, and can continue to strengthen learning after distillation. It can be said that the open source community’s support for DeepSeek is not without reason.It has transformed tickets to AGI from luxury goods to fast-moving consumer goods, allowing more people to come in and try;

– Kimi k1.5 was released at the same time as DeepSeek-R1. However, because there is no open source and insufficient international accumulation, although it has contributed similar algorithm innovation, its influence is quite limited. Also, Kimi is influenced by 2C business. It will highlight the use of short thought chains to achieve methods close to long thought chains, so it will reward k1.5 for using shorter reasoning. Although this original intention is to cater to users not want people to wait too long after asking questions, it seems to backfire. Many of DeepSeek-R1 ‘s out-of-loop materials are highlights in the thought chain that are discovered and spread by users. For people who are exposed to reasoning models for the first time, they don’t seem to mind the verbose efficiency of the model;

– Data annotation is a point that the entire industry is hiding, but this is just a transition plan. A self-learning roadmap like R1-Zero is ideal. Currently, OpenAI’s moat is still very deep. Last month, its Web traffic reached its highest level in history. DeepSeek’s popularity will objectively bring new results to the entire industry, but Meta will be more difficult. LLaMa3 actually has no architectural innovation, and it did not anticipate DeepSeek’s impact on the open source market. Meta’s talent pool is very strong. But the organizational structure does not transform these resources into technological achievements.

Besides Ben Thompson’s podcast, he cross-verified Pan Jiayi’s judgment in many places. For example, R1-Zero removed the technical highlights of HF (Human Feedback) in RLHF, but more discussion focuses on geographical competition and the past of big factories. The narrative is very enjoyable:

– One of the motivations for Silicon Valley to pay too much attention to AI security is that it can rationalize the closed behavior. It has been used in the GPT-2 protocol to prevent the big language model from being used to generate “deceptive and biased” content. However,”deceptive and biased” is far from reaching the level of human extinction. This is essentially a continuation of the cultural war, and is based on the assumption that “the warehouse knows etiquette”, that is, American technology companies have absolute advantages in technology. That’s why we are qualified to be distracted by discussing whether AI is racist;

– Just like when OpenAI decided to hide the o1 thought chain, the original thought chain may be misaligned, and users may feel offended after seeing it, so we decided to cut it across the board and not show it to users. But DeepSeek-R1 falsified the myth above. Yes, in the AI industry, Silicon Valley does not have such a solid leading position. Yes, the exposed thought chain can become part of the user experience, making people more trust the model’s thinking ability after reading it;

– Reddit’s former CEO believes that describing DeepSeek as a Sputnik moment when the Soviet Union launched its first artificial satellite before the United States is a forced political interpretation,He was more certain that DeepSeek was located at the Google moment in 2004,That year, Google showed the world in its prospectus how distributed algorithms connected computer networks and achieved optimal solutions for price and performance. This was different from all technology companies at the time, who just bought increasingly expensive hosts and were willing to be at the front end of the cost curve;

– DeepSeek open-source the R1 model and transparently explains how it did this. This is a great gesture of goodwill. If it continues to incite geopolitics, China companies should have kept their results secret. Google has always drawn a finish line for professional server manufacturers like Sun and pushed competition to the commodity level;

– Roon, a researcher at OpenAI, believes that DeepSeek’s downgrading optimization engineers to overcome the H800 chip could not use Nvidia’s CUDA and could only choose a lower-end PTX. This means that the time they wasted on it cannot be made up for, and American engineers can apply for H100 without hesitation, weakening the hardware cannot bring true innovation;

Whales Rise and Fall: Summary of 2 hardcore podcasts about DeepSeek插图1


– If Google in 2004 listened to Roon’s advice and did not “waste” valuable researchers to build a more economical data center, then perhaps American Internet companies are renting Alibaba’s Cloud Virtual Machine today.In the past two decades when wealth poured in, Silicon Valley has lost the driving force to optimize infrastructure. Large and small factories have also become accustomed to capital-intensive production models and are willing to submit budget forms in exchange for investment.They even used Nvidia’s chips as collateral. As for how to deliver as much value as possible within limited resources, no one cares;

– AI companies will certainly support the Jevens Paradox, which is that cheaper computing creates greater use, but the actual behavior in the past few years has been inconsistent, because every company has shown that research outweighs costs. Preference until DeepSeek really brought the Jevens Paradox to everyone’s attention;

– Nvidia’s company has become more valuable, and Nvidia’s share price has become more risky, which can develop while they can coexist. If DeepSeek can achieve this on highly restricted chips, then imagine how much technological progress will be when they gain full-power computing power resources. This is an incentive inspiration for the entire industry, but Nvidia’s share price is based on the assumption that it is the only supplier, which may be falsified;



– No matter how good Claude’s reputation in San Francisco is, it is difficult to change its natural weakness in selling APIs, which is too easy to replace. ChatGPT gives OpenAI as a consumer technology company greater risk resistance. However, in the long run, DeepSeek will benefit both AI sellers and AI users, and we should be grateful for this generous gift.

Well, that’s almost all. I hope this assignment can help you better understand the true significance of DeepSeek to the AI industry after its exit.

Popular Articles