① Many guests at the meeting believed that conversational AI may be the first to break out in scenarios such as desktop assistants, mobile assistants, smart hardware, and accompanying robots;
② From the perspective of application scenarios, conversational AI is not suitable for visual scenarios, so now it is more voice and auditory interaction scenarios.
“Science and Technology Innovation Board Daily” March 8 (Reporter Li Mingming)Under the wave of generative AI, the industry generally believes that multimodal large models are the only way to realize AGI. The latest report on Voice AI from well-known investment institution a16z also shows that with the continuous advancement of large models, voice will become a key entry point, namely conversational AI.
As conversational artificial intelligence technology matures increasingly, its application scenarios have also shown explosive growth. As one of the important applications of conversational AI, chat robots are widely used in many fields such as customer service, education, medical care, and entertainment.
So, in which field and scenario will the annual explosion of conversational AI appear first?
Recently, at the soundnet interactive AI engine launch conference, Xin Xiaojian, senior product architect of Tongyi Qianwen of Alibaba Cloud Intelligent Group, Feng Wen, senior director of Minimax Solutions, Cao Chao, director of AI product architect of Tencent Cloud, and Yao Guanghua, head of the soundnet AIRTE product line, participated in the discussion.
Many guests at the meeting believed that conversational AI may be the first to break out in scenarios such as desktop assistants, mobile assistants, smart hardware, and accompanying robots.
Cao Chao, director of AI product architect at Tencent Cloud, said that the unique advantage of conversational AI lies in its ability to convey sounds and interactions with emotion and warmth, and as the model is upgraded, it can bring more emotional transmission.
“From the perspective of application scenarios, conversational AI is not suitable for visual scenarios, so now it is more voice and auditory interaction scenarios. For example, some elderly people may have some inconvenience in their eyesight. They also use WeChat to speak with long press and put it close to their ears to listen and see. These groups of people also need some tools to solve their own communication and problem solving needs, and conversational AI has also opened up new opportunities and possibilities for these groups of people.At present, many hardware perspectives of conversational AI are also based on mobile phones.”
Xin Xiaojian, senior product architect of Tongyi Qianwen at Alibaba Cloud Intelligent Group, added,”Learning machines in the education field are also a relatively good scenario. Currently, the shipment of learning machines nationwide is about 60 million units per year. Due to the blessing of large models, the unit price of customers has increased very significantly. Previously, the unit price of learning machines with better quality online has reached more than 8,000 yuan. This is the premium space brought by conversational AI blessing.”
It is understood that the current conversational AI products on the market mainly include
Recently, SoundNet released the world’s first conversational AI engine. With five major capabilities including 650ms ultra-low latency response, elegant interruption, and full-model adaptation, the conversational AI engine can support the rapid upgrading of any large text model to a “capable of talking” conversational multimodal large model.
Yao Guanghua, head of the AI RTE product line of SoundNet, said that after a period of polishing with customers and investigating actual use scenarios, statistics show that for every conversation between users and AI, there will be about 3 rounds of questions and answers on average. The average conversation is calculated. The duration is about 21.1 seconds, and the single cost is only 3 cents. If the number of conversations is 15 times a month, the monthly cost is less than 50 cents, and the annual cost is only 5 yuan. quot;
According to reports, through the SoundNet Conversational AI engine, developers can quickly deploy conversational AI scenarios such as smart assistants, virtual companionship, spoken sparring partners, smart customer service, and smart hardware. For example, smart assistant scenarios can help people manage schedules, query information and perform tasks through natural language interaction.
Talking about the key aspects of large models moving from text to multimodal interactions, guests believed that the multimodal model architecture and training paradigm have not changed much, and the improvement mainly depends on the quality and quantity of data. The key to realizing multimodal interaction is to transform different modal information into the same context. The current development of ASR (Automatic Speech Recognition, a technology that converts human speech into written text) technology has helped achieve this. However, to make the interactive experience better, we need to improve the speed of model reasoning and solve engineering problems such as multi-role long-term memory and role differentiation. At the same time, we need to deal with complex situations in different modal interactions, such as speech semantic differences, Media Processing Service, etc.
In addition, guests generally believed that it was a good thing that DeepSeek’s popularity became popular. It achieved a breakthrough in AI technology and attracted more people’s attention to AI. Its open source is of great significance to technological development, promotes technical exchanges and innovation, and allows more people to participate in AI exploration. In terms of technology, Deep Seek brings new thinking to the industry. For example, in model training, it reduces reliance on large amounts of data, achieves upgrades and iterations through enhanced learning, brings self-evolution of models, reduces computing power requirements, and makes AI universal benefits more possible. In addition, it also validates the business model of model APIs and promotes the advancement of application development paradigms.
Feng Wen, senior director of Minimax Solutions, said that DeepSeek’s exit from the circle is a good phenomenon for all practitioners in the AI industry; compared with before, AI has now subtly entered a larger user base. “Open source will indeed greatly help technology get out of the circle, because DeepSeek is open source, and the technical reports we recently released are actively displaying the latest results.”