Achieving general-purpose robotic manipulation requires robots to seamlessly bridge high-level semantic intent with low-level physical interaction in unstructured environments. However, existing approaches falter in zero-shot generalization. To address this, we present UniManip, a framework grounded in a Bi-level Agentic Operational Graph (AOG) that unifies semantic reasoning and physical grounding.
By coupling a high-level Agentic Layer for task orchestration with a low-level Scene Layer for dynamic state representation, the system continuously aligns abstract planning with geometric constraints. UniManip operates as a dynamic agentic loop: it actively instantiates object-centric scene graphs, parameterizes collision-free trajectories, and exploits structured memory to autonomously diagnose and recover from failures. Our experiments demonstrate a 22.5% and 25.0% higher success rate compared to state-of-the-art VLA and hierarchical baselines.
If you find this work useful for your research, please consider citing it using the following BibTeX entry:
@article{liu2026unimanip,
title={UniManip: General-Purpose Zero-Shot Robotic Manipulation with Agentic Operational Graph},
author={Liu, Haichao and Xue, Yuanjiang and Zhou, Yuheng and Deng, Haoyuan and Liang, Yinan and Xie, Lihua and Wang, Ziwei},
journal={arXiv preprint arXiv:2602.13086},
year={2026}
}