UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph

1Nanyang Technological University, Singapore 2Tsinghua University, China

UniManip enables general-purpose, zero-shot manipulation by unifying semantic reasoning and physical grounding through a Bi-level Agentic Operational Graph (AOG).

Abstract

Achieving general-purpose robotic manipulation requires robots to seamlessly bridge high-level semantic intent with low-level physical interaction in unstructured environments. However, existing approaches falter in zero-shot generalization. To address this, we present UniManip, a framework grounded in a Bi-level Agentic Operational Graph (AOG) that unifies semantic reasoning and physical grounding.

By coupling a high-level Agentic Layer for task orchestration with a low-level Scene Layer for dynamic state representation, the system continuously aligns abstract planning with geometric constraints. UniManip operates as a dynamic agentic loop: it actively instantiates object-centric scene graphs, parameterizes collision-free trajectories, and exploits structured memory to autonomously diagnose and recover from failures. Our experiments demonstrate a 22.5% and 25.0% higher success rate compared to state-of-the-art VLA and hierarchical baselines.

Citation

If you find this work useful for your research, please consider citing it using the following BibTeX entry:

@article{liu2026unimanip,
  title={UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph},
  author={Liu, Haichao and Xue, Yuanjiang and Zhou, Yuheng and Deng, Haoyuan and Liang, Yinan and Xie, Lihua and Wang, Ziwei},
  journal={arXiv preprint arXiv:2602.13086},
  year={2026}
}