Give AI a computer, I’m going out to play.
In-depth measurement of Manus, this is the DeepSeek moment for the AI Agent industry
Wen| Lanxi
Manus swiped the screen for a day. From becoming famous overnight at the beginning, to being hard to find a yard in the middle, to spending a lot of money questioning its publicity, FOMO’s emotions and intuitive vigilance were intertwined throughout the process, making it a very interesting communication study sample.
In fact, the AI industry has always been a “explosion-driven” information model in recent years. What we understand has been disenchanted, but what we don’t understand will be rare and strange. However, there is a saying that if we explode like this every day, objectively there will also be real explosions and muddle into it.
My evaluation of Manus is that it really belongs to the table that really explodes and can be called the DeepSeek moment in the AI Agent industry, but there is a patch that I will fold at the end.
Let’s first look at a demonstration effect by Manus:
Let it develop an interactive text game that can play the role of Google’s CEO. By experiencing important decisions in the company’s history, it can not only enjoy the fun of the game, but also understand the company’s culture.
It took almost an hour to develop the web game of Google’s CEO Simulator. The completion rate was very high. Click to start the game, and you will also have to choose the difficulty. Then you will face every transformation node in Google’s history. Your choice will determine the changes in the company’s resources and affect the final outcome of the game.
Playing a game with one sentence in an hour is the ability of AI Agents.
Different from traditional conversational AI, it no longer just provides answers at the information level, but can operate the computer to complete more specific work tasks, including but not limited to writing programs, making web pages, completing reports, screening resumes, etc., etc., it can completely resolve various difficulties encountered in the process and deliver work results. Of course, there are exceptions, and we will talk about this exception later.
Currently, there are not many mainstream AI Agent services, and they are generally very expensive. For example, ChatGPT Operator costs US$200 a month for Pro members to use, and Devin, an AI engineer product that focuses on the programming market, costs US$500 per month.
The developer of Manus is Monica, a China model team. It is currently in the free testing stage. The cost of a single task has been reduced to US$2, which is 1/10 of that of OpenAI. At the same time, it has surpassed OpenAI to become the strongest in the world in the benchmark rankings.
After I took the invitation code, I had exhausted Manus’s single-day computing resources within a few hours. I was really excited and the effect was very shocking.
Show a few measured cases:
First, I asked it to help me make a linktree style personal homepage. Manus divided this task into eight steps. First, I collected my information from the entire network, including my links and representative works on various platforms, and then started writing web code based on linktree’s design style. Half an hour later, it delivered such a work to me.
It is simple, but it perfectly meets the requirements, and there is no problem with interaction. If you want to make it more beautiful, you can continue to write prompt words to modify it.
The second test was that I used Manus to help an engineer group friend solve a practical problem. The Atlas robotic arm he was responsible for maintaining in the factory had a small problem. It would cost several thousand yuan to find after-sales phone bills. It was better to find a way to make up for it himself. He was too lazy to read the documents, so he directly gave me a paragraph to ask Manus to see how to deal with it.
Note that this requirement can theoretically be met by ordinary conversational AI, but it will require more interactive processes. For example, you have to feed it the document and get the answer step by step. But Manus doesn’t need these. It will go to Atlas’s official website to download the document. After reading it, I find the key content needed to solve the problem, analyze it carefully, and create a program. I sent the final code to a friend. It has a few minor flaws, but it is completely usable after manual modification. Directly saves the number of after-sales calls.
The third test was when my Weibo reader suggested that Manus do a minimalist chronicle of a country. I added requirements for comic list selection and web design. The color matching of the final delivered work was a bit difficult and unaesthetic. This point must be emphasized repeatedly, but Manus’s server was down by this time and couldn’t be modified for the time being, so I just displayed the semi-finished product.
It can be seen that Manus divided British history into 10 different eras, drew SVG pictures based on the style of the times, and finally presented them on the HTML web page. It can be said to be a model room for human-computer collaboration, whether it is an extracurricular lesson plan or a preview of a work, there is an extremely convenient threshold to get started.
The last case was that I asked Manus to make a music game, but the icon had to use the character of the original god. It first began to study the game mechanism and implementation method of music, and then tried to collect the image materials of the original god. An exception occurred at this time. It issued a takeover request for the first time. The reason was also very speechless. Its operating logic was blocked by a network disk and it was impossible to register an account, so it could not download resources. It wanted me to help it.
It seems that no matter how powerful the AI is, it will be blocked out by members of online disk.
In line with the principle of allowing AI Agents to complete their work independently as much as possible, I did not do this. Instead, I slightly changed the requirements and asked Manus to use the logo of a technology company to make game icons, because open-copyright SVG materials were all over the network. So now Manus had no problem running, and he quickly finished a fun game with points, and it was smooth to play.
However, it can also be seen that when solving such relatively complex problems, Manus still lacks details, which is also related to the lack of human participation. For example, the adaptation of the screen requires more explanation, Manus’s response to changes is not slow, but because it also encountered the trouble of server downtime, this task has not continued to improve for the time being.
I think these few measured examples can very clearly show that the capabilities and shortcomings of AI Agents at this stage. Manus is no longer the kind of product that can only operate browsers. It has a sandbox environment and can complete the work before it completes its work. Conduct testing by itself and deliver after passing acceptance. However, it is also limited to the data boundary of the Internet. If there are not enough resources on the network, there is no way for it to be self-sufficient in production resources.
I also did some tests on partial text, which can also be used to compare the characteristics of AI Agents:
For example, I asked Manus to give her operating skills based on the 10 most popular Xing Jianya (game characters) videos on Station B.
Manus really watched all 10 videos and spent more than an hour refining the short compositions of each UP owner into the materials I wanted. It was quite accurate. If the same task was handed over to a networked model, although it could be completed, the probability of hallucinations was very high, and it was not as reliable as the AI Agent in terms of “honesty”.
Another example is to ask Manus to study the arbitrage possibilities of PolyMarket. Although I do have a little expectation and want to get an investment guide that will make no losses. Don’t laugh, Manus has done his homework conscientiously and listed four arbitrage opportunities. As long as I see eligible projects appearing in PolyMarket, I can bet according to the rules without thinking.
Judging from the playback, Manus starts from the most basic information every time. First, he understands what PolyMarket is, then analyzes the gameplay of predicting the market, and then builds a risk strategy based on the platform rules. He has a standard intern style, hard work and hard work, and is practical and durable.
By the way, the design of playback is also one of Manus ‘highlights in my opinion. It is a bit like the choice of an inference model exposing the thought chain. Many times, the AI’s thinking process is more inspiring than the supply of answers. Every task of Manus has a playback function and can be shared. The means it demonstrates on the way to solving problems can be completely called another form of intelligent asset and can act as a human teacher.
So then again, I evaluate Manus as the DeepSeek moment in the AI Agent industry. A patch needs to be made here. It’s the DeepSeek-V2 moment. In May 2024, the DeepSeek open source V2 version of the model. This was the first time it was released from the circle because the price was very cheap, but because the model itself was average, many people just thought that DeepSeek was coming to fight a price war. They were surprised but didn’t pay attention to it, and the popularity did not last long.
It was not until the continuous release of DeepSeek-V3 and R1 that everyone realized that things were completely different. The cost logic of the entire large model market was subverted overnight.
At first, no one cared about this disaster. It was just a mountain fire, a drought, the extinction of a species, and the disappearance of a city, until this disaster was closely related to everyone.” mdash;”Wandering Earth”
I mean, the development of AI technology is continuous, and on this ups and downs curve, the signal strength of each time determines the depth of breakthrough in the future. Just like DeepSeek would not have V3 without V2, let alone R1. My view on Manus has not changed. At the historical turning point of bringing AI Agent services from professional scenarios to universal scenarios, it is the founding brand that established the foundation.
From the perspective of use cases, as an AI Agent, it is very powerful in functionality and has a high degree of proficiency in dismantling tasks. The observation feeling of CoA (agent chain) is very similar to that of CoT (thought chain), and it can “see” AI in multiple scenarios. Evaluate and seek optimal solutions.
In theory, there should be a large number of CoAs built in to undertake the undertaking. Inference models such as DeepSeek must digest enough CoTs in advance before they can be introduced to the mass market, covering mainstream needs as much as possible. You can see it from the Use Case on the official website.
It is not allowed to reproduce at will without authorization, and the Blue Whale reserves the right to pursue corresponding responsibilities.