Your Position Home AI Technology

The pressure of bean buns has just begun

Article source: Cailian AI daily

Image source: Generated by AIImage source: Generated by AI

Today, the ByteDance Bean Bag Big Model team proposed a new sparse model architecture UltraMem. This architecture effectively solves the high memory access problem during MoE reasoning. The reasoning speed is 2-6 times higher than that of MoE architecture, and the reasoning cost can be reduced by up to 83%.

At present, competition in the field of large models at home and abroad has become increasingly fierce and has entered a white-hot stage. Bean buns have been comprehensively laid out on both the AI basic layer and application layer, and continue to be iteratively upgraded.

Large models continue to reduce costs and increase efficiency

According to research by the Doubao Big Model team, under the Transformer architecture, the performance of the model is logarithmic related to the number of parameters and computational complexity. As the size of LLM continues to increase, reasoning costs will increase sharply and the speed will slow down.

Although the MoE (Mixed Expert) architecture has successfully decoupled calculations and parameters, when reasoning, a smaller batch size activates all experts, causing a sharp increase in memory accesses, which in turn significantly increases reasoning latency.

The ByteDance Bean Bag Big Model Foundation team proposed UltraMem, a sparse model architecture that also decouples calculations and parameters. It solves the memory access problem of reasoning while ensuring the model effect.

Experimental results show that with the same parameters and activation conditions, UltraMem surpasses MoE in model effect and improves the inference speed by 2-6 times. In addition, at common batch sizes, UltraMem’s memory access costs are almost equivalent to those of the Dense model with the same computational volume.

It can be seen that large model manufacturers are striving to reduce costs and increase efficiency, whether on the training side or the inference side. The core reason is that as the model scale expands, reasoning costs and memory access efficiency have become key bottlenecks limiting large-scale model applications, and DeepSeek has embarked on the path of “low-cost, high-performance” breakthrough.

Liu Fanping, CEO of Core Digital Intelligence, analyzed in an interview with a reporter from “Science and Technology Innovation Board Daily” thatTo reduce the cost of large models, the industry is more inclined to make breakthroughs from the technical and engineering levels to achieve “overtaking in corners” with optimized architecture. The cost of infrastructure, such as Transformer, is still high, and new architecture research is necessary; basic algorithms, mainly backpropagation algorithms, may be the bottleneck of deep learning.

In Liu Fanping’s view, in the short term, the high-end chip market will still be dominated by Nvidia. The market demand for inference applications is increasing, and domestic GPU companies now also have opportunities. In the long run, once the algorithm is innovated, the results will still be quite amazing, and the entire computing power market demand remains to be seen in the later stage.

The pressure of bean buns has just begun

During the just-past Spring Festival, DeepSeek quickly became popular around the world with its low training costs and efficient computing efficiency, becoming a dark horse in the AI field.At present, competition in the field of large models at home and abroad has become increasingly fierce and has entered a white-hot stage.

DeeSeek is currently the strongest competitor of Doubao among domestic models. On January 28, the former surpassed the latter in terms of daily active users for the first time. At present, DeepSeek’s daily activity data has exceeded 40 million, making it the first application in the history of China’s mobile Internet that has been online for less than a month but has broken into the Top 50 daily activities on the entire network.

In recent days, the beanbag model team has continued to make efforts. Two days ago, it just released the video generation experimental model “VideoWorld”. Unlike mainstream multimodal models such as Sora, DALL-E, and Midjourney, VideoWorld is the first time in the industry that it can recognize the world without relying on language models.

At present, bean bags have been comprehensively laid out in the AI basic layer and application layer, and continue to be iteratively upgraded. Its AI product matrix has covered multiple fields, such as AI chat assistant bean bag, cat box, dream AI, star painting, bean bag MarsCode, etc.

On February 12, bean bag concept stocks rose rapidly in the afternoon. According to Wind data, the cumulative increase in the Douyin Bean Bag Index since February has exceeded 15%. In terms of individual stocks, BYAN Technology had a strong daily limit, Hand Information quickly pulled up its daily limit, and Guanghetong and Advanced Digital Connect surged higher intraday trading.

CITIC Securities previously released a research report that believes that the ecological expansion of bean bun AI will trigger a new round of technology investment cycles for giants. The AI industry has strong network effects and scale effects. When head AI applications gain user leadership, their competitive advantages such as model accuracy, marginal cost, and user stickiness will gradually strengthen.

The number of bean bag users continues to grow, and the application ecosystem based on bean bag AI is expected to accelerate. On the one hand, it will catalyze the company’s investment in AI training and reasoning computing infrastructure. On the other hand, the rapid growth of bean bag AI will stimulate other giant manufacturers to increase investment in AI infrastructure.

But for Doubao himself, the competition with top student DeepSeek may have just begun.

As an open source model, DeepSeek’s low cost and high performance are changing the model selection strategies of many companies. At present, many AI applications owned by Huawei, Baidu and other companies have announced access to DeepSeek, and even ByteDance themselves. The multi-dimensional table function of their Flying Book has been connected to the DeepSeek-R1 model, and the volcano engine has also been adapted.

According to a reporter from the Science and Technology Innovation Board Daily, at present, the Doubao team is still discussing whether the Doubao App should be connected to DeepSeek. From the perspective of user experience, it is understandable to choose a model with better results, but it is also difficult to give up your own model and choose friends. Business, it is also difficult to explain to shareholders. This does not consider issues such as adding new model access and increasing adaptation burden.

Popular Articles