XITU

Turing, XITU's proprietary Vision-Language-Action model is a generalist AI system built for lifelong learning. As Turing acquires new skills through real-world experience, its performance continuously evolves—enabling increasingly sophisticated and adaptive motion control.

Pre-Training

Turing bridges human demonstration and robot execution. Massive egocentric human data teaches complex interaction intent, while high-fidelity robot trajectories ensure precise motion control. Unified VQA signals inject rich visual-language context, resolving the core tension between precision and perception.

Dual-System Co-Training and Inference

Two Systems, One Mind. System 1 (Cognition) reasons—understanding tasks, planning strategies, making decisions. System 2 (Execution) acts—translating intent into instant, fluid motion through diffusion-based control. Together, they deliver cognition and execution in perfect harmony.

Mixture of Experts (MoE)

Intelligence on Demand: The router instantly selects the right experts for each input—activating only what's needed, when it's needed. This sparse, Top-K activation unlocks specialization without linear scaling, giving MoE its superpower: massive model capacity at minimal computational cost.

On-policy DAgger

Heuristic DAgger for Recovery Data Expansion: By initializing the system into manually designed failure states, Heuristic DAgger efficiently acquires recovery data and proactively captures failure experiences without waiting for natural failures. This approach achieves diverse training data at zero robot time cost.