![]() |
I am currently a Senior Researcher at ByteDance. My work mission is to capture and understand human-centric scenes in the real world, and digitalize humans, objects, and events for immersive applications in VR/AR. Prior to this, I joined Tencent as a Senior Researcher through the Special Recruitment Talents Program (技术大咖 ). Before entering the industry, I graduated from the Department of Automation at Tsinghua University, where I had the honor of being supervised by Qionghai Dai and Lu Fang, and collaborated closely with Yebin Liu and Lan Xu. |
![]() |
HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors
Xiaozheng Zheng, Chao Wen, Zhaohu Li, Weiyi Zhang, Zhuo Su, Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, Yongjie Zhang, Guidong Wang, Lan Xu 3DV 2025 We propose a 3D head avatar creation method that generalizes from few-shot in-the-wild data. By using 3D head priors from a large-scale dataset and a Gaussian Splatting-based network, our approach achieves high-fidelity rendering and robust animation. |
![]() |
SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models
We introduce SMGDiff, a novel two-stage framework for generating real-time and user-controllable soccer motions. Our key idea is to integrate real-time character control with a powerful diffusion-based generative model, ensuring high-quality and diverse output motion. |
![]() |
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu CVPR 2024. We present an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage, in which our core intuition is to marry the 3D Gaussian representation with non-rigid tracking. |
![]() |
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue CVPR 2024 OHTA is a novel approach capable of creating implicit animatable hand avatars using just a single image. It facilitates 1) text-to-avatar conversion, 2) hand texture and geometry editing, and 3) interpolation and sampling within the latent space. |
![]() |
Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
Xiaozheng Zheng*, Zhuo Su*, Chao Wen, Zhou Xue, Xiaojie Jin ICCV 2023 We propose a two-stage framework that can obtain accurate and smooth full-body motions with the three tracking signals of head and hands only, in which we first explicitly model the joint-level features and then utilize them as spatiotemporal transformer tokens to capture joint-level correlations. |
![]() |
Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu CVPR 2023 We propose a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. |
![]() |
Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream
Zhuo Su, Lan Xu, Dawei Zhong, Zhong Li, Fan Deng, Shuxue Quan, Lu Fang TPAMI 2022 We propose a robust volumetric performance reconstruction system for human-object interaction scenarios using only a single RGBD sensor, which combines various data-driven visual and interaction cues to handle the complex interaction patterns and severe occlusions. |
![]() |
NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions
Yuheng Jiang, Suyi Jiang, Guoxing Sun, Zhuo Su, Kaiwen Guo, Minye Wu, Jingyi Yu, Lan Xu CVPR 2022 We propose a robust neural volumetric rendering method for human-object interaction scenarios using 6 RGBD cameras, achieving layer-wise and photorealistic reconstruction results of human performance in novel views. |
![]() |
Learning Variational Motion Prior for Video-based Motion Capture
Xin Chen*, Zhuo Su*, Lingbo Yang*, Pei Cheng, Lan Xu, Gang Yu arXiv 2022 We propose a novel variational motion prior (VMP) learning approach for video-based motion capture. Specifically, VMP is implemented as a transformer-based variational autoencoder pretrained over large-scale 3D motion data, providing an expressive latent space for human motion at sequence level. |
![]() |
RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera
Zhuo Su, Lan Xu, Zerong Zheng, Tao Yu, Yebin Liu, Lu Fang ECCV 2020 (Spotlight) We introduce a robust human volumetric capture approach combined with various data-driven visual cues using a Kinect, which outperforms existing state-of-the-art approaches significantly. |
![]() |
UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using Commercial RGBD Cameras
Lan Xu, Zhuo Su, Lei Han, Tao Yu, Yebin Liu, Lu Fang TPAMI 2019 We propose UnstructuredFusion, which allows realtime, high-quality, complete reconstruction of 4D textured models of human performance via only three commercial RGBD cameras. |
Human 3D Reconstruction and Generation
The construction of realistic 3D human avatars is crucial in VR/AR applications. This talk focuses on 3D human modeling, covering topics from traditional volumetric capture to neural rendering, from per-scene optimization to generalizable prior model training and generative methods. I shared my exploration in this field, hoping to inspire related research.
Dec 26, 2024, ByteDance, Online Live Stream | ByteTech Technical Sharing Seminar