Haoyi Zhu
Haoyi Zhu
Home
Featured
Publications
Experience
Gallery
Poems
Contact
Curriculum Vitae
Light
Dark
Automatic
Embodied AI
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
Abstract: In this paper, we introduce an innovative vector quantization based action tokenizer built upon the largest-scale action trajectory dataset to date, leveraging over 100 times more data than previous approaches.
Yating Wang
,
Haoyi Zhu 朱皓怡
,
Mingyu Liu
,
Jiange Yang
,
Hao-Shu Fang
,
Tong He
Cite
Website
GitHub
PDF
arXiv
DeepVerse: 4D Autoregressive Video Generation as a World Model
Abstract: World models serve as essential building blocks toward Artificial General Intelligence (AGI), enabling intelligent agents to predict future states and plan actions by simulating complex physical interactions. However, existing interactive models primarily predict visual observations, thereby neglecting crucial hidden states like geometric structures and spatial coherence.
Junyi Chen
,
Haoyi Zhu 朱皓怡
,
Xianglong He
,
Yifan Wang
,
Jianjun Zhou
,
Wenzheng Chang
,
Yang Zhou
,
Zizun Li
,
Zhoujie Fu
,
Jiangmiao Pang
,
Tong He
Cite
Website
GitHub
PDF
arXiv
CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning
Abstract: Learning latent motion from Internet videos is crucial for building generalist robots. However, existing discrete latent action methods suffer from information loss and struggle with complex and fine-grained dynamics. We propose CoMo, which aims to learn more informative continuous motion representations from diverse, internet-scale videos.
Jiange Yang
,
Yansong Shi
,
Haoyi Zhu 朱皓怡
,
Mingyu Liu
,
Kaijing Ma
,
Yating Wang
,
Gangshan Wu
,
Tong He
,
Limin Wang
Cite
PDF
arXiv
Aether: Geometric-Aware Unified World Modeling
A geometric-aware unified world model, capable of 4D reconstruction, action-conditioned prediction, and visual planning.
Haoyi Zhu 朱皓怡
,
Yifan Wang
,
Jianjun Zhou
,
Wenzheng Chang
,
Yang Zhou
,
Zizun Li
,
Junyi Chen
,
Chunhua Shen
,
Jiangmiao Pang
,
Tong He
Cite
Project
PDF
arXiv
Code
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Abstract: Learning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability.
Jiange Yang
,
Haoyi Zhu 朱皓怡
,
Yating Wang
,
Gangshan Wu
,
Tong He
,
Limin Wang
Cite
PDF
Arxiv
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
A novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI.
Haoyi Zhu 朱皓怡
,
Honghui Yang
,
Yating Wang
,
Jiange Yang
,
Limin Wang
,
Tong He
Cite
Project
PDF
Arxiv
Twitter
Code
HuggingFace Model
RealWorld Code
YouTube Video
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Extensive experiments prove that point cloud observations are beneficial for robot learning.
Haoyi Zhu 朱皓怡
,
Yating Wang
,
Di Huang
,
Weicai Ye
,
Wanli Ouyang
,
Tong He
Cite
Project
Github
PDF
Arxiv
RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
Abstract: A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
Hao-Shu Fang
,
Hongjie Fang
,
Zhenyu Tang
,
Jirong Liu
,
Junbo Wang
,
Haoyi Zhu 朱皓怡
,
Cewu Lu
Cite
Project
PDF
Arxiv
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Building open-ended agents with internet-scale knowledge in Minecraft.
Linxi Fan
,
Guanzhi Wang
,
Yunfan Jiang
,
Ajay Mandlekar
,
Yuncong Yang
,
Haoyi Zhu 朱皓怡
,
Andrew Tang
,
De-An Huang
,
Yuke Zhu
,
Anima Anandkumar
Cite
Project
PDF
Arxiv
Twitter
Code
Database
Blog
Video
Cite
×
13,216 Total Pageviews