Welcome! I am Yifan Liu, an undergraduate student majoring in Mathematics & Physics and Measurement Control Technology in Tsinghua University, Beijing, China. My primary research interest lies in computer vision, with a focus on designing machine learning systems that exhibit human-like perception and cognitive ability. My research experience includes 3D computer vision, Multimodal LLMs (VLMs), and Embodied AI.

🔥 News

  • 2025.11:  SandboxVLM is selected by the Awesome series. Thanks for your recognition!
  • 2025.11:  🎉🎉 LLaVA-UHD v2 is accepted by AAAI26.
  • 2025.06:  Start working as a visiting research intern at Harvard University.
  • 2025.06:  🎉🎉 My first first-author paper HGG is accepted by ICCV25.
  • 2025.01:  Start working as a remote research intern at Carnegie Mellon University.
  • 2024.03:  🎉🎉 The first paper I participated in MirageRoom is accepted as Highlight by CVPR24.

📝 Publications

sym

Abstract 3D Perception for Spatial Intelligence in Vision-Language Models

Yifan Liu, Fangneng Zhan, Kaichen Zhou, Yilun Du, Paul P. Liang, Hanspeter Pfister

Paper

  • arxiv preprint. Glad to be selected by the Awesome series!
  • Drawing on how humans navigate 3D environments without reconstructing precise appearance, formulated a “3D Sandbox” representation that abstracts essential spatial information into symbolic boxes as VLM reasoning context.
  • Achieved an 8.3% improvement over the baseline. Serve as a plug-and-play solution for proprietary and open-source VLMs to enhance 3D spatial reasoning.
sym

(Under review) Paper about 3D VLMs

  • Under review.
  • Introduced a geometry-aware VLM encoding that leverages SLAM-derived 3D structure.
  • Closed the gap between the 3D grounding ability of VLMs and expert models, while preserving strong 2D vision and language abilities.
sym

(Under review) Paper about active vision and RL

  • Under review.
  • Is human reasoning the missing ingredient in reinforcement learning for VLMs?
  • Obtain human-inspired reasoning data in navigation with a simulator-in-the-loop tree-search. Warm start for reasoning abilities.
  • A multi-turn embodied RL framework to polish the reasoning and moving abilities.
sym

RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph

Yifan Liu, Fangneng Zhan, Wanhua Li, Haowen Sun, Katerina Fragkiadaki, Hanspeter Pfister

Paper

  • arxiv preprint.
  • Introduced a graph-based framework that integrates 3D pre-trained priors into deep learning estimation networks to address 3D ambiguity in hand-eye calibration tasks.
  • Designed a dual-branch (2D-3D) topology with cross-branch consistency.
ICCV25
sym

Learning Efficient and Generalizable Human Representation with Human Gaussian Model

Yifan Liu*, Shengjun Zhang*, Chensheng Dai, Yang Chen, Hao Liu, Chen Li, Yueqi Duan

Paper Project Page

  • ICCV 2025
  • Introduced feed-forward 3D Gaussians into human avatar generation at application-level speeds.
  • Developed a dual-graph representation that enables coherent information flow across frames and local regions, addressing the temporal redundancy problem in Gaussian prediction from videos.
AAAI26
sym

LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer

Yipeng Zhang*, Yifan Liu*, Zonghao Guo, Yidan Zhang, Xuesong Yang, Xiaoying Zhang, Chi Chen, Jun Song, Bo Zheng, Yuan Yao, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

Paper Code

  • AAAI 2026.
  • Developed the construction and compression of an Inverse Feature Pyramid, mitigating the visual detail degradation problem in VLMs. Our method achieved 3.7% performance growth across benchmarks.
  • Adopted by the open-source industrial model MiniCPM-V(link), which has garnered over 22.2k stars on GitHub.
CVPR24 Highlight
sym

MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection

Haowen Sun, Yueqi Duan, Juncheng Yan, Yifan Liu, Jiwen Lu

Paper

  • CVPR 2024 (Highlight)
  • Proposed the use of the mirage phenomenon from physics to replace real-world straight-line projection to address occlusion during rendering.
  • Enabled a SoTA 2D-3D hybrid perception network for indoor point cloud segmentation.

📖 Educations

  • 2022.08 - (now), Undergraduate Student at Tsinghua University, Beijing, China. Majoring in Mathematics & Physics and Measurement Control Technology (double major). Strong foundation in mathematics, physics and sensors.
  • 2019.09 - 2022.07, High school student at No.2 High School of East China Normal University. Marvelous experience! Member of Shanghai Physics Olympiad Team and got a national 🥈 in the 38th Chinese Physics Olympiad.

💻 Internships

  • Visiting Undergraduate Research Intern at VCG, Harvard University. Working on vision for robotics and 3D for embodied AI with Prof. Hanspeter Pfister.
  • Remote Research Intern at Computer Vision, Carnegie Mellon University. Working on 3D language grounding and vision for Embodied AI with Prof. Katerina Fragkiadaki.
  • Research Intern at THUNLP, Tsinghua University. Working on basic visual architecture of Multimodal LLMs with Prof. Zhiyuan Liu.
  • Research Intern at i-Vision Group, Tsinghua University. Working on point cloud perception, Gaussian reconstruction and 3D gaussian avatar with Prof. Yueqi Duan.