Haoyi Zhu
Haoyi Zhu
Home
Featured
Publications
Experience
Gallery
Poems
Contact
Curriculum Vitae
Light
Dark
Automatic
Computer Vision
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Abstract: The field of 4D world modeling - aiming to jointly capture spatial geometry and temporal dynamics - has witnessed remarkable progress in recent years, driven by advances in large-scale generative models and multimodal learning.
... (other authors)
,
Haoyi Zhu 朱皓怡
,
... (19 authors)
Cite
Website
arXiv
GitHub
WinT3R: Window-Based Streaming Reconstruction With Camera Token Pool
Abstract: We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance.
Zizun Li
,
Jianjun Zhou
,
Yifan Wang
,
Haoyi Guo
,
Wenzheng Chang
,
Yang Zhou
,
Haoyi Zhu 朱皓怡
,
Junyi Chen
,
Chunhua Shen
,
Tong He
Cite
Website
arXiv
GitHub
π
3
: Scalable Permutation-Equivariant Visual Geometry Learning
Abstract: We introduce
π
3
, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal.
Yifan Wang
,
Jianjun Zhou
,
Haoyi Zhu 朱皓怡
,
Wenzheng Chang
,
Yang Zhou
,
Zizun Li
,
Junyi Chen
,
Jiangmiao Pang
,
Chunhua Shen
,
Tong He
Cite
Website
GitHub
PDF
arXiv
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
Abstract: In this paper, we introduce an innovative vector quantization based action tokenizer built upon the largest-scale action trajectory dataset to date, leveraging over 100 times more data than previous approaches.
Yating Wang
,
Haoyi Zhu 朱皓怡
,
Mingyu Liu
,
Jiange Yang
,
Hao-Shu Fang
,
Tong He
Cite
Website
GitHub
PDF
arXiv
DeepVerse: 4D Autoregressive Video Generation as a World Model
Abstract: World models serve as essential building blocks toward Artificial General Intelligence (AGI), enabling intelligent agents to predict future states and plan actions by simulating complex physical interactions. However, existing interactive models primarily predict visual observations, thereby neglecting crucial hidden states like geometric structures and spatial coherence.
Junyi Chen
,
Haoyi Zhu 朱皓怡
,
Xianglong He
,
Yifan Wang
,
Jianjun Zhou
,
Wenzheng Chang
,
Yang Zhou
,
Zizun Li
,
Zhoujie Fu
,
Jiangmiao Pang
,
Tong He
Cite
Website
GitHub
PDF
arXiv
Aether: Geometric-Aware Unified World Modeling
A geometric-aware unified world model, capable of 4D reconstruction, action-conditioned prediction, and visual planning.
Haoyi Zhu 朱皓怡
,
Yifan Wang
,
Jianjun Zhou
,
Wenzheng Chang
,
Yang Zhou
,
Zizun Li
,
Junyi Chen
,
Chunhua Shen
,
Jiangmiao Pang
,
Tong He
Cite
Project
PDF
arXiv
Code
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
A novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI.
Haoyi Zhu 朱皓怡
,
Honghui Yang
,
Yating Wang
,
Jiange Yang
,
Limin Wang
,
Tong He
Cite
Project
PDF
Arxiv
Twitter
Code
HuggingFace Model
RealWorld Code
YouTube Video
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
A general 3D pre-training approach establishing a pathway to 3D foundational models.
Haoyi Zhu 朱皓怡
,
Honghui Yang
,
Xiaoyang Wu
,
Di Huang
,
Sha Zhang
,
Xianglong He
,
Tong He
,
Hengshuang Zhao
,
Chunhua Shen
,
Yu Qiao
,
Wanli Ouyang
Cite
Github
PDF
Arxiv
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Abstract: In the context of autonomous driving, the significance of effective feature learning is widely acknowledged. While conventional 3D self-supervised pre-training methods have shown widespread success, most methods follow the ideas originally designed for 2D images.
Honghui Yang
,
Sha Zhang
,
Di Huang
,
Xiaoyang Wu
,
Haoyi Zhu 朱皓怡
,
Tong He
,
Shixiang Tang
,
Hengshuang Zhao
,
Qibo Qiu
,
Binbin Lin
,
Xiaofei He
,
Wanli Ouyang
Cite
Github
PDF
Arxiv
AlphaTracker: a multi-animal tracking and behavioral analysis tool
Abstract: Computer vision has emerged as a powerful tool to elevate behavioral research. This protocol describes a computer vision machine learning pipeline called AlphaTracker, which has minimal hardware requirements and produces reliable tracking of multiple unmarked animals, as well as behavioral clustering.
Zexin Chen
,
Ruihan Zhang
,
Hao-Shu Fang
,
Yu E. Zhang
,
Aneesh Bal
,
Haowen Zhou
,
Rachel R. Rock
,
Nancy Padilla-Coreano
,
Laurel R. Keyes
,
Haoyi Zhu 朱皓怡
,
Yong-Lu Li
,
Takaki Komiyama
,
Kay M. Tye
,
Cewu Lu
Cite
Paper
Code
»
Cite
×
19,826 Total Pageviews