Zhuo Su 苏 卓

 


I am a senior researcher/engineer at Bytedance. Previously, I served as a senior researcher at Tencent (Recruitment Talents Program: "技术大咖 "). Before that, I earned my Master's degree from Department of Automation, Tsinghua University, supervised by Prof. Qionghai Dai and Prof. Lu Fang, and meanwhile, I worked closely with Prof. Yebin Liu and Lan Xu.

My work mission is to capture and understand dynamic human-centric scenes in the real world, and digitalize humans, objects, and events for immersive application in virtual and augmented reality. My research primarily centers around computer vision and graphics, especially human 3D generation, avatar creation, 4D reconstruction, neural rendering, motion capture, and related areas.

I am looking for full-time partners and research interns, please feel free to drop me an email if you are interested in the topics above.


Email: suzhuo13@gmail.com | suzhuo@bytedance.com

Background | Research | Awards| Skills | Services | Google Scholar

Background


Research

3D Generation | Avatar Creation | 4D Reconstruction| Neural Rendering | Motion Capture

    1. 3D Generation

     

          HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
    Panwang Pan*, Zhuo Su* (Project Lead), Chenguo Lin*, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu (*Equal Contribution)
    Neural Information Processing Systems Annual Conference (NeurIPS), 2024.

    We propose HumanSplat, a method that predicts the 3D Gaussian Splatting properties of a human from a single input image in a generalizable way. It utilizes a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors to effectively integrate geometric priors and semantic features.
    [Paper] [Project page]

     

          Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
    Muxin Zhang, Qiao Feng, Zhuo Su, Chao Wen, Zhou Xue, Kun Li
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

    We introduce Joint2Human, a novel method that leverages 2D diffusion models to generate detailed 3D human geometry directly, ensuring both global structure and local details.
    [Paper] [Project page]

    2. Avatar Creation

     

          OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
    Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

    OHTA is a novel approach capable of creating implicit animatable hand avatars using just a single image. It facilitates 1) text-to-avatar conversion, 2) hand texture and geometry editing, and 3) interpolation and sampling within the latent space.
    [Paper] [Project page]

     

          HeadGAP: Few-shot 3D Head Avatar via Generalizable GAussian Priors
    Xiaozheng Zheng, Chao Wen, Zhaohu Li, Weiyi Zhang, Zhuo Su, Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, Yongjie Zhang, Guidong Wang, Lan Xu
    International Conference on 3D Vision (3DV), 2025

    We propose a 3D head avatar creation method that generalizes from few-shot in-the-wild data. By using 3D head priors from a large-scale dataset and a Gaussian Splatting-based network, our approach achieves high-fidelity rendering and robust animation.
    [Paper] [Project page]

    3. 4D Reconstruction

     

          Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream
    Zhuo Su, Lan Xu, Dawei Zhong, Zhong Li, Fan Deng, Shuxue Quan, Lu Fang
    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.

    We propose a robust volumetric performance reconstruction system for human-object interaction scenarios using only a single RGBD sensor, which combines various data-driven visual and interaction cues to handle the complex interaction patterns and severe occlusions.
    [Paper] [Project page]

     

          RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera
    Zhuo Su, Lan Xu, Zerong Zheng, Tao Yu, Yebin Liu, Lu Fang
    European Conference on Computer Vision (ECCV), 2020, Spotlight.

    We introduce a robust human volumetric capture approach combined with various data-driven visual cues using a Kinect, which outperforms existing state-of-the-art approaches significantly.
    [Paper] [Project page]

     

          UnstructuredFusion: Realtime 4D Geometry and Texture Reconstruction using Commercial RGBD Cameras
    Lan Xu, Zhuo Su, Lei Han, Tao Yu, Yebin Liu, Lu Fang
    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.

    We propose UnstructuredFusion, which allows realtime, high-quality, complete reconstruction of 4D textured models of human performance via only three commercial RGBD cameras.
    [Paper] [Project page]

    4. Neural Rendering

     

          HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
    Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

    we present an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage, in which our core intuition is to marry the 3D Gaussian representation with non-rigid tracking.
    [Paper] [Project page]

     

          Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
    Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, Lan Xu
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

    We propose a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism.
    [Paper] [Project page]

     

          NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions
    Yuheng Jiang, Suyi Jiang, Guoxing Sun, Zhuo Su, Kaiwen Guo, Minye Wu, Jingyi Yu, Lan Xu
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

    We propose a robust neural volumetric rendering method for human-object interaction scenarios using 6 RGBD cameras, which achieves layer-wise and photorealistic reconstruction results of human performance in novel views.
    [Paper] [Project page]

    5. Motion Capture

     

          EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs
    Zhen Fan*, Peng Dai*, Zhuo Su*, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang (*Equal Contribution)
    arXiv, 2024.

    We introduce EMHI, a dataset combining stereo images from headsets and IMU data for egocentric human motion capture in VR. It includes 28.5 hours of data from 58 subjects. We also propose MEPoser, a method that effectively uses this multimodal data for improved pose estimation.
    [Paper] [Project page: Coming soon]

     

          HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
    Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

    We propose HMD-Poser, the first unified approach torecover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD,HMD+2IMUs, HMD+3IMUs, etc.
    [Paper] [Project page]

     

          Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
    Xiaozheng Zheng*, Zhuo Su*, Chao Wen, Zhou Xue, Xiaojie Jin (*Equal Contribution)
    IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

    We propose a two-stage framework that can obtain accurate and smooth full-body motions with the three tracking signals of head and hands only, in which we first explicitly model the joint-level features and then utilize them as spatiotemporal transformer tokens to capture joint-level correlations.
    [Paper] [Project page]

     

          Learning Variational Motion Prior for Video-based Motion Capture
    Xin Chen*, Zhuo Su*, Lingbo Yang*, Pei Cheng, Lan Xu, Gang Yu (*Equal Contribution)
    arXiv, 2022.

    We propose a novel variational motion prior (VMP) learning approach for video-based motion capture. Specifically, VMP is implemented as a transformer-based variational autoencoder pretrained over large-scale 3D motion data, providing an expressive latent space for human motion at sequence level.
    [Paper]


Patents & early publications

  • Lu Fang, Zhuo Su, Lei Han, Qionghai Dai, “Depth camera calibration method and device, electronic equipment and storage medium”, CN:201810179738:A
  • Lu Fang, Lei Han, Zhuo Su, Qionghai Dai, “A three-dimensional rebuilding method and device based on a depth camera, an apparatus and a storage medium”, CN:201810179264:A
  • Lu Fang, Zhuo Su, Lan Xu, “Dynamic three-dimensional reconstruction method, device, equipment, medium and system”, CN:201910110062:A
  • Lu Fang, Zhuo Su, Lan Xu, “Texture real-time determination method, device and equipment for dynamic scene and medium”, CN:201910110044:A
  • Lu Fang, Zhuo Su, Lan Xu, Jianwei Wen, Chao Yuan, “Dynamic human body three-dimensional reconstruction method, device, equipment and medium”, CN:202010838902:A
  • Lu Fang, Zhuo Su, Lan Xu, Jianwei Wen, Chao Yuan, “Dynamic human body three-dimensional model completion method and device, equipment and medium”, CN:202010838890:A
  • Zhuo Su, Xiaozhe Wang, Wen Fei, Changfu Zhou, “Multi-feature information landmark detection method for precise landing of unmanned aerial vehicle”, CCN:201710197369:A
  • Wen Fei, Zhuo Su* (*corresponding author), Changfu Zhou, “Artificial landmark design and detection using hierarchy information for UAV localization and landing”, Chinese Control And Decision Conference 2017 (CCDC 2017), [Paper]
  • Haina Wu, Zhuo Su, Kai Luo, Qi Wang, Xianzhong Cheng "Exploration and Research on the Movement of Magnus Glider”, Physical Experiment of College, 2015 (5): 2

Awards

  • Outstanding Graduate of Beijing, Beijing, 2021
  • Outstanding Graduate of Department of Automation, Tsinghua University, 2021
  • Excellent Bachelor Thesis Award, Northeastern University, 2018
  • Outstanding Graduate of Liaoning Province, Liaoning Province, 2018
  • National Scholarship, Ministry of Education, 2018
  • Excellence Award for National Undergraduate Innovation Program, Northeastern University, 2017
  • City's Excellent Undergraduate, Shenyang City, 2017
  • Mayor's Scholarship, Shenyang City, 2017
  • Top Ten Excellent Undergraduate (10 / the whole university, 十佳本科生), Northeastern University, 2017
  • Honorable Mention of American Mathematical Contest in Modeling, COMAP, 2017
  • Second Prize of National Undergraduate Mathematical Contest in Modeling, CSIAM, 2016
  • First Prize of Provincial Undergraduate Mathematical Contest in Modeling, Liaoning Province, 2016
  • 2x Second Prize of Electronic Design Contest, Education Department of Liaoning Province, 2015-2016
  • 4x First Class Scholarships, Northeastern University, 2015-2018

Skills

     C & C++(OpenCV, OpenGL, CUDA, Eigen, ...), Python(Pytorch), Matlab, LaTeX, ...

Services

     Reviewer for CVPR, NIPS, ICLR, TVCG, IEEEVR, 3DV, AISTATS, ...