ACE Robotics Kairos : the open-source world model that dominates 4 embodied intelligence benchmarks
🔎 A 4-billion-parameter model just dethroned NVIDIA on its own turf
On June 15, 2026, a team from SenseTime released Kairos 3.0, a 4-billion-parameter world model. Result: first place on four global embodied intelligence benchmarks, including RoboTwin 2.0 and LIBERO-Plus. Seventy-two times faster than NVIDIA Cosmos 2.5 in inference. Open-source.
This is an earthquake. Embodied intelligence — the ability of a robot to understand and anticipate the physics of the real world — was considered the last bastion where American industrialists maintained a decisive lead. Kairos proves otherwise.
The significance goes beyond simple rankings. A Chinese, open, lightweight model manages to simulate the physical behavior of an environment better than closed architectures ten times heavier. The race for humanoid robots has just changed dynamics.
The essentials
- Kairos 3.0 is a real-time generative 4B-parameter world model, developed by ACE Robotics (SenseTime), open-source since June 15, 2026.
- It takes 1st place on 4 benchmarks of embodied intelligence: RoboTwin 2.0, LIBERO-Plus, and two other robotic manipulation evaluations.
- It is 72x faster than NVIDIA Cosmos 2.5 in inference, while being 10x lighter in parameters.
- Its architecture relies on a conditional video generation approach that predicts the future states of a physical scene based on robotic actions.
Recommended tools
| Tool | Main use | Price (June 2026, check on site) | Ideal for |
|---|---|---|---|
| Kairos 3.0 (GitHub) | Open-source world model for robotics | Free (Apache 2.0) | Researchers and robotics teams |
| Hostinger | Hosting to deploy robotic control interfaces | Starting from €2.99/month | Supervision app prototyping |
| Ollama | Local AI model execution | Free | Testing local control policies |
What a world model is — and why it's the holy grail of robotics
A world model is a model that simulates the physical behavior of the world. It takes as input a current state (image, depth map, object positions) and an action proposed by the robot, then predicts the next state of the scene.
Concretely, before moving its arm, the robot "thinks" within the model: "If I push this cup to the left at this speed, what happens?" The world model generates the future video sequence. If the cup falls, the robot adjusts its plan before even having moved.
This is fundamentally different from an LLM that predicts text. Here, we are predicting physics. And this is exactly what research on world simulators for robots has been exploring for years, as shown by the work on Google's Veo World simulators or 4D geometric models like GEM-4D (studies published on arXiv in 2025-2026).
The stakes are colossal: a good world model eliminates the need for millions of hours of physical training. The robot trains in simulation, then transfers to the real world. This is what we call sim-to-real transfer, and it is the main bottleneck of all modern robotics.
The 4 benchmarks conquered — backed by the numbers
Kairos 3.0 doesn't just perform well. It dominates.
RoboTwin 2.0 is the reference benchmark for evaluating control policies in digital twin environments. It measures a model's ability to generalize manipulation behaviors from simulated data to real-world scenarios. Kairos takes the lead here with a significant margin over the runner-up, according to data published by Thailand Business News.
LIBERO-Plus evaluates long-term planning capabilities in household object manipulation. The robot must chain sequential tasks (opening a drawer, grabbing an object, placing it elsewhere). This is where the world model's temporal prediction capability is crucial. Kairos surpasses previous leaders, notably agentic LLM-based approaches like GPT-5.5 (98.2 in the June 2025 agentic ranking) coupled with external simulators.
The other two benchmarks, not explicitly named in the press releases but confirmed by TMTPost, cover industrial manipulation and unstructured environment navigation scenarios.
This quadruple victory is not a statistical fluke. It indicates that Kairos possesses a cross-domain generalization capability that previous models did not achieve individually.
Architecture: how 4 billion parameters are enough to beat giants
The key to Kairos lies in its architectural efficiency. Most competing world models — led by NVIDIA Cosmos 2.5 — use heavy diffusion approaches, with tens of billions of parameters, which require industrial GPUs to run in real time.
Kairos adopts a different approach. Its architecture relies on conditional generation optimized for the prediction of physical states rather than for video aesthetic quality. The model does not seek to generate "beautiful" images. It generates physically accurate images.
This distinction is fundamental. By eliminating the aesthetic pressure, ACE Robotics engineers were able to drastically reduce the size of the model while improving its physical fidelity. According to Bastille Post, Kairos's performance-to-parameter ratio is currently unparalleled.
The inference speed — 72x that of Cosmos 2.5 — transforms the model from an offline research tool into an embeddable component. A robot can literally "dream" its actions in closed loop at 30+ FPS, which was impossible with the previous generation of world models.
SenseTime vs NVIDIA Cosmos vs Figure Helix: the new world map
The battle of robotic world models is structuring itself around three poles.
NVIDIA Cosmos 2.5 represents the American industrial approach: heavy, closed model, integrated into the CUDA/Omniverse ecosystem. Powerful but expensive, slow in inference, dependent on NVIDIA hardware. This is the "proprietary and vertically integrated" approach.
Figure Helix, the AI system launched by Figure AI for its humanoid robots, represents the "robot-first" approach. Helix is designed to run on a specific body, with deep sensorimotor integration. Its advantage: it is optimized for a particular robot. Its limitation: it does not generalize to other platforms. This logic of a dedicated closed system is comparable to what is found among specialized search agents, similar to what OpenSeeker-v2 analyzes in its decryption of the industrial search agents' monopoly.
Kairos 3.0 (ACE Robotics/SenseTime) represents a third way: open-source, lightweight, hardware-independent, generalist. Any robotics team can integrate it, fine-tune it, deploy it. This is the "Android of robotics" strategy.
The parallel with open-source AI agents is illuminating. In the same way that ByteDance's DeerFlow proved that an open-source agent could rival proprietary systems over the long term, Kairos proves that open-source can dominate in embedded intelligence. The pattern repeats itself: commoditization through open-source ends up winning.
| Model | Parameters | Openness | Inference speed | Target |
|---|---|---|---|---|
| Kairos 3.0 | 4B | Open-source | 72x Cosmos 2.5 | Generalist |
| NVIDIA Cosmos 2.5 | ~40B (est.) | Closed | Low baseline | NVIDIA ecosystem |
| Figure Helix | Undisclosed | Closed | Optimized embedded | Figure humanoids |
What this means for the global robot race
Edge AI is the next frontier. Everyone knows it, but no one expected the first truly dominant model to be open-source and Chinese.
The immediate consequence: robotics labs around the world will adopt Kairos as their base. Not because it's Chinese, but because it's better and free. Open-source creates a network effect: the more researchers use it, the faster it improves, and the wider the gap with closed alternatives becomes.
The strategic consequence: the United States is losing an advantage it considered structural. AI chip export controls did not stop SenseTime from producing a model that runs on accessible hardware. The message is clear: hardware restriction is no longer enough to contain software innovation.
The economic consequence: the cost of developing an intelligent robot has just plummeted. When the most critical component — the world model — is free and lightweight, the barrier to entry in robotics is no longer software-based. It is mechanical and manufacturing-based. And in that regard, China also has a head start.
Why inference speed changes everything
A slow world model is a research tool. A fast world model is an embedded brain.
The distinction is crucial. Before Kairos, world models were used offline: thousands of simulated trajectories were generated, a control policy was trained on them, and then this policy was deployed on the robot. The world model was never present at the moment of action.
With Kairos, the world model can run in a closed loop during execution. The robot perceives its environment, proposes an action, asks Kairos to predict the outcome, and only executes it if the prediction is satisfactory. This is AI-augmented "model-predictive control", in real time.
This capability fundamentally changes robotic safety. A robot that can "see the future" before acting is a robot that does not make irreversible mistakes. In the context of deploying humanoid robots in human environments — hospitals, factories, homes — this is a weighty argument that goes beyond mere raw performance.
The 72x speed gain is not a marketing figure. It is the difference between "1 prediction per second" and "72 predictions per second". In robotic manipulation, where the dynamics of a falling object play out in milliseconds, it is the difference between missing and catching.
The link with agentic LLMs: why "general" benchmarks are no longer enough
The June 2025 agentic LLM rankings place GPT-5.5 en tête avec 98.2, followed by Gemini 3 Pro Deep Think à 95.4. But these scores measure abstract reasoning capacity, not physical understanding.
An excellent agentic LLM can plan "open the fridge, grab the bottle, pour it". But without a world model, it doesn't know that the bottle will slip if it's wet, that the fridge has resistance after 30 cm of opening, or that the liquid will splash differently depending on its viscosity.
This is where Kairos complements (and in some cases replaces) agentic LLMs for robotics. Instead of using an LLM as a planner with a slow external physics simulator, you can use a lightweight LLM like Claude Sonnet 4.6 (81.4 agentic) for natural language, coupled with Kairos for physics. The result is a faster, more reliable, less expensive system.
This hybrid architecture of "lightweight LLM + specialized world model" will likely become the industry standard by the end of 2026. Heavy agentic models like GPT-5.5 will keep their place for complex offline reasoning, but the real-time control loop will belong to dedicated world models.
For those who want to explore this logic of specialized agents running locally, the guide on agents IA open-source avec Ollama offers a practical starting point for experimenting with lightweight architectures.
The current limitations of Kairos — what the numbers don't tell you
Despite its impressive results, Kairos 3.0 has limitations that press releases gloss over.
First, benchmarks are simulated environments. The transition to the real world — the sim-to-real gap — remains the ultimate challenge of robotics. Being first on RoboTwin 2.0 does not guarantee being first in a real kitchen with reflections, unpredictable shadows, and deformable objects.
Second, Kairos is a generic world model. It is not optimized for a specific robotic morphology. Figure Helix, although closed and limited to Figure's hardware, benefits from deep sensorimotor integration that Kairos cannot match without significant fine-tuning. This tension between generalism and specialization is structural.
Third, the question of fine-tuning is not settled. Kairos is open-source, but the training data is not necessarily entirely so. A team that wants to adapt Kairos to a specific robotic arm will have to collect its own demonstration data, which remains costly and time-consuming.
Finally, the geopolitical context adds uncertainty. A Chinese open-source model could face usage restrictions in certain countries or by certain companies, regardless of its technical quality.
❌ Common mistakes
Mistake 1: Confusing world model with video generation
A world model is not a Sora or a Veo. Its success metric is not the beauty of the generated image, but the physical fidelity of the prediction. Evaluating Kairos as a video generation tool is to completely miss its point. Visual artifacts are acceptable as long as the physical dynamics are correct.
Mistake 2: Believing that open-source means "easy to deploy"
Kairos is open-source, not plug-and-play. Integrating it into a robotic control loop requires skills in robotics engineering, sensor calibration, and sim-to-real transfer. Downloading the model weight on GitHub does not make you a roboticist.
Mistake 3: Ignoring the reliance on demonstration data
A world model predicts the future based on learned patterns. If your deployment environment is not represented in the training data (unusual objects, atypical lighting, exotic physics), Kairos's predictions will be wrong. Quality data collection remains the bottleneck.
❓ Frequently Asked Questions
What is a world model compared to an LLM?
An LLM predicts text tokens. A world model predicts future physical states (usually in the form of video frames). The LLM reasons about "what to do", the world model simulates "what if". They are complements, not substitutes.
Can Kairos 3.0 run on an embedded robot?
In theory yes, thanks to its 4B parameters and its inference speed. In practice, it depends on the embedded hardware. An NVIDIA Jetson Orin-type GPU would probably be sufficient, but real-world deployment tests are still limited and not published in detail.
Why is China suddenly dominating in world models?
SenseTime has been accumulating computer vision expertise since 2014. Their multimodal database, combined with massive access to industrial simulation scenarios, creates a data advantage that compensates for hardware restrictions. Open-source is also a strategic choice to create a standard.
How does Kairos compare to recent academic research models?
Work like GEM-4D (4D geometry for manipulation) or the evaluation of robotic policies in Google's Veo simulators explore similar paths. But these models remain academic and undeployed. Kairos has the merit of being open-source, documented, and comparatively benchmarked.
Should you choose between Kairos and an agentic LLM for robotics?
No. The most promising architecture is hybrid: an LLM for understanding natural language instructions and high-level planning, coupled with a world model like Kairos for physical simulation and low-level control. It is this complementarity that will define the next generation of robotic systems.
✅ Conclusion
Kairos 3.0 just proved that embedded intelligence doesn't need giant closed models to dominate. Four billion parameters, open-source, 72x faster than Cosmos 2.5, first on four benchmarks: the numbers speak for themselves. The question is no longer whether open-source can compete in robotics, but how fast it will become the default standard. If you are working on robotic systems or local AI agents, the guide on Ollama agents is the best starting point to understand this new hybrid architecture that is redefining the rules of the game.