Welcome! I am Yifan Liu, an undergraduate student majoring in Mathematics & Physics and Measurement Control Technology in Tsinghua University, Beijing, China. My primary research interest lies in computer vision, with a focus on designing machine learning systems that exhibit human-like perception and cognitive ability. My research experience includes 3D computer vision, Multimodal LLMs (VLMs), and Embodied AI.
🔥 News
- 2025.11: SandboxVLM is selected by the Awesome series. Thanks for your recognition!
- 2025.11: 🎉🎉 LLaVA-UHD v2 is accepted by AAAI26.
- 2025.06: Start working as a visiting research intern at Harvard University.
- 2025.06: 🎉🎉 My first first-author paper HGG is accepted by ICCV25.
- 2025.01: Start working as a remote research intern at Carnegie Mellon University.
- 2024.03: 🎉🎉 The first paper I participated in MirageRoom is accepted as Highlight by CVPR24.
📝 Publications

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models
Yifan Liu, Fangneng Zhan, Kaichen Zhou, Yilun Du, Paul P. Liang, Hanspeter Pfister
- arxiv preprint. Glad to be selected by the Awesome series!
- Drawing on how humans navigate 3D environments without reconstructing precise appearance, formulated a “3D Sandbox” representation that abstracts essential spatial information into symbolic boxes as VLM reasoning context.
- Achieved an 8.3% improvement over the baseline. Serve as a plug-and-play solution for proprietary and open-source VLMs to enhance 3D spatial reasoning.

(Under review) Paper about 3D VLMs
- Under review.
- Introduced a geometry-aware VLM encoding that leverages SLAM-derived 3D structure.
- Closed the gap between the 3D grounding ability of VLMs and expert models, while preserving strong 2D vision and language abilities.

(Under review) Paper about active vision and RL
- Under review.
- Is human reasoning the missing ingredient in reinforcement learning for VLMs?
- Obtain human-inspired reasoning data in navigation with a simulator-in-the-loop tree-search. Warm start for reasoning abilities.
- A multi-turn embodied RL framework to polish the reasoning and moving abilities.

RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph
Yifan Liu, Fangneng Zhan, Wanhua Li, Haowen Sun, Katerina Fragkiadaki, Hanspeter Pfister
- arxiv preprint.
- Introduced a graph-based framework that integrates 3D pre-trained priors into deep learning estimation networks to address 3D ambiguity in hand-eye calibration tasks.
- Designed a dual-branch (2D-3D) topology with cross-branch consistency.
Learning Efficient and Generalizable Human Representation with Human Gaussian Model
Yifan Liu*, Shengjun Zhang*, Chensheng Dai, Yang Chen, Hao Liu, Chen Li, Yueqi Duan
- ICCV 2025
- Introduced feed-forward 3D Gaussians into human avatar generation at application-level speeds.
- Developed a dual-graph representation that enables coherent information flow across frames and local regions, addressing the temporal redundancy problem in Gaussian prediction from videos.

Yipeng Zhang*, Yifan Liu*, Zonghao Guo, Yidan Zhang, Xuesong Yang, Xiaoying Zhang, Chi Chen, Jun Song, Bo Zheng, Yuan Yao, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun
- AAAI 2026.
- Developed the construction and compression of an Inverse Feature Pyramid, mitigating the visual detail degradation problem in VLMs. Our method achieved 3.7% performance growth across benchmarks.
- Adopted by the open-source industrial model MiniCPM-V(link), which has garnered over 22.2k stars on GitHub.

MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun, Yueqi Duan, Juncheng Yan, Yifan Liu, Jiwen Lu
- CVPR 2024 (Highlight)
- Proposed the use of the mirage phenomenon from physics to replace real-world straight-line projection to address occlusion during rendering.
- Enabled a SoTA 2D-3D hybrid perception network for indoor point cloud segmentation.
📖 Educations
- 2022.08 - (now), Undergraduate Student at Tsinghua University, Beijing, China. Majoring in Mathematics & Physics and Measurement Control Technology (double major). Strong foundation in mathematics, physics and sensors.
- 2019.09 - 2022.07, High school student at No.2 High School of East China Normal University. Marvelous experience! Member of Shanghai Physics Olympiad Team and got a national 🥈 in the 38th Chinese Physics Olympiad.
💻 Internships
- Visiting Undergraduate Research Intern at VCG, Harvard University. Working on vision for robotics and 3D for embodied AI with Prof. Hanspeter Pfister.
- Remote Research Intern at Computer Vision, Carnegie Mellon University. Working on 3D language grounding and vision for Embodied AI with Prof. Katerina Fragkiadaki.
- Research Intern at THUNLP, Tsinghua University. Working on basic visual architecture of Multimodal LLMs with Prof. Zhiyuan Liu.
- Research Intern at i-Vision Group, Tsinghua University. Working on point cloud perception, Gaussian reconstruction and 3D gaussian avatar with Prof. Yueqi Duan.