Alibaba Builds Qwen-Robot: A Functional Approach to the Robot Economy


In short

  • Alibaba unveiled the Qwen-Robot Suite, a three-dimensional AI suite designed for robot control, control, and global simulation using interactive software.
  • The company says its prototypes top several robotics benchmarks, using millions of training examples and thousands of hours of open-source robotics.
  • Global deployment of robots is still years away.

Alibaba’s Qwen Group released the Qwen-Robot Suite on Tuesday: three types of foundations to create what it calls “a complete set of artificial intelligence.” Qwen-RobotNav works. Qwen-RobotManip works by trickery. Qwen-RobotWorld simulates the physics that makes all of this possible. Each works independently. Together, it’s the Android era of robotics — operating systems, not devices.

Alibaba is currently the only company in China that uses chips, cloud, models, platforms, and software. For the company, robotics is the most visible part of the bet, known as embedded AI.

AI providers currently rely on LLMs to power their decisions. Robots have always worked with machine learning models that, although advanced, lack the flexibility of AI output. Physical therapists face other, more difficult types of failure: physics, not motivation.

In this context, Alibaba launched a new AI program with different components:

Qwen-RobotNav They link five navigation tasks—following directions, goal search, object search, target search, and motor control—each of which requires different memory strategies. Most examples hardcode one way. Qwen-RobotNav exposes parameterized features: machine costs, temporal damage, single camera weights that the designer can change mid-session.

Trained on 15.6 million samples with variables in all fields, it achieves a success rate of 76.5% on the VLN-CE RxR, an indicator of the flow of vision and language in real-world environments, and a tracking of 90% on the EVT-Bench, which measures the agent’s ability to continuously track.

Qwen-RobotManip tackles one of the biggest challenges in robotics control: different robots represent actions in different ways. Franka’s arm (a type of robot with seven axis of movement) Humanoids also add another layer of complexity, using full-body coordination.

To close this gap, Alibaba created approximately 38,100 hours of open-source robotics training and human videos – independent of data collection. This model ranks first on the RoboChallenge Table30-v1, surpassing previous methods by 20%.

Qwen-RobotWorld is the most ambitious: a universal cinema with languages ​​that uses natural language as a universal form. “Take a red cup and pour water on a flower” works if the player is a handler, an autonomous vehicle, or a sailor.

The Embodied World Knowledge corpus consists of 8.6 million pairs of videos-200 million frames-for manipulation (5.9 million samples, 1,300+ skills, 20+ morphologies), autonomous driving (Waymo, NVIDIA PhysicalAI-AD, Bench2Drive), internal transfer of human (VLN-robot) arms.

It ranks first on EWMBench and DreamGen Bench, two benchmarks that test how real-world models predict and generate real-world scenarios. It also outperforms all open models on WorldModelBench and PBench, and excels in physics: Newton’s laws, mass management, fluid dynamics, gravity.

ChatGPT for robots?

While the Western labs (Google DeepMind, Nvidia, Figure, Physical Intelligence) pursue similar goals, mainly focus on movement or control, not a unified, integrated group. Alibaba’s vertical integration from chips through software means it controls everything. The open source differentiates itself from its competitors by relying on private robot data.

There are some misconceptions that may be worth dispelling: These are not robots but software models—brains, not bodies. They run on hardware from AgileX, Franka, Universal Robots, Unitree, and others.

Also, although these are AI versions of robotics, these are not LLMs like your ChatGPT. A language model predicts symptoms. These models must understand physics, surface relationships, and the effects of physical activity. A language example tells you that a glass breaks if it is dropped. Qwen-RobotWorld predicts how it breaks – breaking, fluid forces, secondary collisions. Qwen-RobotManip plans to understand what prevents it from crashing completely.

Don’t expect to have your own home robot anytime soon. The difference between the control display of a fruit basket robot and a robot working reliably in your home is huge. RoboCasa365, LIBERO-Plus, RoboTwin-Clean2Rand—these are benchmarks. Global shipping brings sensor noise, actuator movement, and long tails that have slowed down every robotic effort in history, and Alibaba recognizes this.

Technical achievements are real, however. The first interaction method with RobotManip solves a real problem in parallel training. RobotNav’s parameterized observation interface is a smart solution to the problem of perception. RobotWorld’s language-as-universal-action-interface is the correct abbreviation for the universal action interface.

Alibaba did not disclose prices, timing, or customers who get access to the trial programs.

Daily Debrief A letter

Start each day with top stories right here, including originals, podcasts, videos and more.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *