Slide 1
Slide 2

I am a Ph.D. candidate in Computer Graphics Group at The University of Hong Kong, supervised by Prof. Wenping Wang and Prof. Taku Komura. I received my B. Eng. degree with honors at Shandong University, advised by Prof. Shiqing Xin.
I am currently a visiting scholar in the Department of Computer and Information Science at the University of Pennsylvania working with Prof. Lingjie Liu at Graphics Lab and GRASP Lab.
I am a member of AnySyn3D research group, with a mission to enhance the speed, affordability, and quality of 3D AIGC 🌎. I am also with Efficient Computing Interest Group.

🌟Expected graduation in 2025, open to postdoc and research scientist positions.🌟

Research Interests (Visualization): Character Animation, Geometric Modeling and Processing, Computer Graphics, Human Behavior Analysis (Capture, Modeling and Simulation).

Shandong University     The University of Hong Kong University of Pennsylvania


News

  • Feb. 2024: One paper accepted to CVPR 2023.

  • Aug. 2023: One paper accepted to SIGGRAPH Asia 2023.

  • Jul.  2023: One paper accepted to ICCV 2023.

  • Mar. 2023: One paper accepted to SIGGRAPH 2023. We won SIGGRAPH 2023 Best Paper Award.

  • Mar. 2023: One paper accepted to PNAS Nexus 2023. Press release by EurekAlert!.

  • Aug. 2022: One paper accepted to SIGGRAPH Asia 2022.

  • Feb. 2022: One paper accepted to EUROGRAPHICS 2022.

  • Collaboration


    I host remote interns (2024-2025) focusing on: i. Motion Generation (both kinematics-based and physics-based). ii. Motion Capture from Videos and Images. iii. Shape Generation (both implicit and explicit). Feel free to reach out via my email with your CV.


Selected Publications

* Equal contribution. cs: coming soon.

Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long*, Yuanchen Guo*, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, Wenping Wang.
CVPR 2023.

  • project page
  • paper
  • code
  • Hugging Face Demo
  • abstract
    In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.

TLControl: Trajectory and Language Control for Human Motion Synthesis
Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu.
Arxiv 2023.

  • project page
  • paper
  • video
  • code
  • abstract
    Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion synthesis, incorporating both low-level trajectory and high-level language semantics controls. Specifically, we first train a VQ-VAE to learn a compact latent motion space organized by body parts. We then propose a Masked Trajectories Transformer to make coarse initial predictions of full trajectories of joints based on the learned latent motion space, with user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce an efficient test-time optimization to refine these coarse predictions for accurate trajectory control. Experiments demonstrate that TLControl outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation
Wenyang Zhou, Zhiyang Dou†, Zeyu Cao, Zhouyingcheng Liao, Jingbo Wang, Wenjia Wang, Yuan Liu, Taku Komura, Wenping Wang, Lingjie Liu.
Arxiv 2023.
† Project Lead.
  • project page
  • paper
  • video
  • code
  • abstract
    We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Although previous motion diffusion models have shown impressive results, they struggle to achieve fast generation while maintaining high-quality human motions. Motion latent diffusion has been proposed for efficient motion generation. However, effectively learning a latent space can be non-trivial in such a two-stage manner. Meanwhile, accelerating motion sampling by increasing the step size, e.g., DDIM, typically leads to a decline in motion quality due to the inapproximation of complex data distributions when naively increasing the step size. In this paper, we propose EMDM that allows for much fewer sample steps for fast motion generation by modeling the complex denoising distribution during multiple sampling steps. Specifically, we develop a Conditional Denoising Diffusion GAN to capture multimodal data distributions conditioned on both control signals, i.e., textual description and denoising time step. By modeling the complex data distribution, a larger sampling step size and fewer steps are achieved during motion synthesis, significantly accelerating the generation process. To effectively capture the human dynamics and reduce undesired artifacts, we employ motion geometric loss during network training, which improves the motion quality and training efficiency. As a result, EMDM achieves a remarkable speed-up at the generation stage while maintaining high-quality motion generation in terms of fidelity and diversity.

Disentangled Clothed Avatar Generation from Text Descriptions
Jionghao Wang, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Xin Li, Wenping Wang, Rong Xie, Li Song
Arxiv 2023.

  • project page
  • paper(cs)
  • code
  • abstract
    In this paper, we introduced a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have yielded diverse human avatars from text prompts, these methods typically combine all elements—clothes, hair, and body—into a single 3D representation. Such an entangled approach poses challenges for downstream tasks like editing or animation. To overcome these limitations, we propose a novel disentangled 3D avatar representation named Sequentially Offset-SMPL (SO-SMPL), building upon the SMPL model. SO-SMPL represents the human body and clothes with two separate meshes, but associates them with offsets to ensure the physical alignment between the body and the clothes. Then, we design an Score Distillation Sampling(SDS)-based distillation framework to generate the proposed SO-SMPL representation from text prompts. In comparison with existing text-to-avatar methods, our approach not only achieves higher exture and geometry quality and better semantic alignment with text prompts, but also significantly improves the visual quality of character animation, virtual try-on, and avatar editing.

Surf-D: High-Quality Surface Generation for Arbitrary Topologies using Diffusion Models
Zhengming Yu*, Zhiyang Dou*, Xiaoxiao Long, Cheng Lin, Zekun Li, Yuan Liu, Norman Müller, Taku Komura, Marc Habermann, Christian Theobalt, Xin Li, Wenping Wang
Arxiv 2023.
  • project page
  • paper
  • code
  • abstract
    In this paper, we present Surf-D, a novel method for generating high-quality 3D shapes as Surface with arbitrary topologies using Diffusion models. Specifically, we adopt Unsigned Distance Field (UDF) as the surface representation, as it excels in handling arbitrary topologies, enabling the generation of complex shapes. While the prior methods explored shape generation with different representations, they suffer from limited topologies and geometry details. Moreover, it's non-trivial to directly extend prior diffusion models to UDF because they lack spatial continuity due to the discrete volume structure. However, UDF requires accurate gradients for mesh extraction and learning. To tackle the issues, we first leverage a point-based auto-encoder to learn a compact latent space, which supports gradient querying for any input point through differentiation to effectively capture intricate geometry at a high resolution. Since the learning difficulty for various shapes can differ, a curriculum learning strategy is employed to efficiently embed various surfaces, enhancing the whole embedding process. With pretrained shape latent space, we employ a latent diffusion model to acquire the distribution of various shapes. Our approach demonstrates superior performance in shape generation across multiple modalities and conducts extensive experiments in unconditional generation, category conditional generation, 3D reconstruction from images, and text-to-shape tasks. Our code will be publicly available upon paper publication.

C·ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters
Zhiyang Dou, Xuelin Chen, Qingnan Fan, Taku Komura, Wenping Wang.
SIGGRAPH Asia 2023.

  • project page
  • paper
  • video
  • code (cs)
  • abstract
    We present C·ASE, an efficient and effective framework that learns Conditional Adversarial Skill Embeddings for physics-based characters. C·ASE enables the physically simulated character to learn a diverse repertoire of skills while providing controllability in the form of direct manipulation of the skills to be performed. This is achieved by dividing the heterogeneous skill motions into distinct subsets containing homogeneous samples for training a low-level conditional model to learn the conditional behavior distribution. The skill-conditioned imitation learning naturally offers explicit control over the character’s skills after training. The training course incorporates the focal skill sampling, skeletal residual forces, and element-wise feature masking to balance diverse skills of varying complexities, mitigate dynamics mismatch to master agile motions and capture more general behavior characteristics, respectively. Once trained, the conditional model can produce highly diverse and realistic skills, outperforming state-of-the-art models, and can be repurposed in various downstream tasks. In particular, the explicit skill control handle allows a high-level policy or a user to direct the character with desired skill specifications, which we demonstrate is advantageous for interactive character animation.

TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer
Zhiyang Dou*, Qingxuan Wu*, Cheng Lin, Zeyu Cao, Qiangqiang Wu, Weilin Wan, Taku Komura, Wenping Wang.
ICCV 2023.
 
  • project page
  • paper
  • code
  • abstract
    In this paper, we introduce a set of simple yet effective TOken REduction (TORE) strategies for Transformer-based Human Mesh Recovery from monocular images. Current SOTA performance is achieved by Transformer-based structures. However, they suffer from high model complexity and computation cost caused by redundant tokens. We propose token reduction strategies based on two important aspects, i.e., the 3D geometry structure and 2D image feature, where we hierarchically recover the mesh geometry with priors from body structure and conduct token clustering to pass fewer but more discriminative image feature tokens to the Transformer. Our method massively reduces the number of tokens involved in high-complexity interactions in the Transformer. This leads to a significantly reduced computational cost while still achieving competitive or even higher accuracy in shape recovery. Extensive experiments across a wide range of benchmarks validate the superior effectiveness of the proposed method. We further demonstrate the generalizability of our method on hand mesh recovery. Our code will be publicly available once the paper is published.

Globally Consistent Normal Orientation for Point Clouds by Regularizing the Winding-Number Field
Rui Xu, Zhiyang Dou, Ningna Wang, Shiqing Xin, Shuangmin Chen, Mingyan Jiang, Xiaohu Guo, Wenping Wang, Changhe Tu.
ACM Transactions on Graphics. SIGGRAPH 2023.

SIGGRAPH 2023 Best Paper Award; See more here.

  • project page
  • paper
  • video
  • code
  • abstract
    Estimating normals with globally consistent orientations for a raw point cloud has many downstream geometry processing applications. Despite tremendous efforts in the past decades, it remains challenging to deal with an unoriented point cloud with various imperfections, particularly in the presence of data sparsity coupled with nearby gaps or thin-walled structures. In this paper, we propose a smooth objective function to characterize the requirements of an acceptable winding-number field, which allows one to find the globally consistent normal orientations starting from a set of completely random normals. By taking the vertices of the Voronoi diagram of the point cloud as examination points, we consider the following three requirements: (1) the winding number is either 0 or 1, (2) the occurrences of 1 and the occurrences of 0 are balanced around the point cloud, and (3) the normals align with the outside Voronoi poles as much as possible. Extensive experimental results show that our method outperforms the existing approaches, especially in handling sparse and noisy point clouds, as well as shapes with complex geometry/topology.
RFEPS: Reconstructing Feature-line Equipped Polygonal Surface
Rui Xu, Zixiong Wang, Zhiyang Dou, Chen Zong, Shiqing Xin, Mingyan Jiang, Tao Ju, Changhe Tu.
ACM Transactions on Graphics. SIGGRAPH Asia 2022.

  • project page
  • paper
  • video
  • code
  • abstract
    Feature lines are important geometric cues in characterizing the structure of a CAD model. Despite great progress in both explicit reconstruction and implicit reconstruction, it remains a challenging task to reconstruct a polygonal surface equipped with feature lines, especially when the input point cloud is noisy and lacks faithful normal vectors. In this paper, we develop a multistage algorithm, named RFEPS, to address this challenge. The key steps include (1)denoising the point cloud based on the assumption of local planarity, (2)identifying the feature-line zone by optimization of discrete optimal transport, (3)augmenting the point set so that sufficiently many additional points are generated on potential geometry edges, and (4) generating a polygonal surface that interpolates the augmented point set based on restricted power diagram. We demonstrate through extensive experiments that RFEPS, benefiting from the edge-point augmentation and the feature-preserving explicit reconstruction, outperforms state-of-the-art methods in terms of the reconstruction quality, especially in terms of the ability to reconstruct missing feature lines.

Coverage Axis: Inner Point Selection for 3D Shape Skeletonization
Zhiyang Dou, Cheng Lin, Rui Xu, Lei Yang, Shiqing Xin, Taku Komura, Wenping Wang.
Computer Graphics Forum. EUROGRAPHICS 2022.

Fast-Forward Attendees Award, 2nd Place.

  • project page
  • paper
  • code
  • suppl.
  • abstract
    In this paper, we present a simple yet effective formulation called Coverage Axis for 3D shape skeletonization. Inspired by the set cover problem, our key idea is to cover all the surface points using as few inside medial balls as possible. This formulation inherently induces a compact and expressive approximation of the Medial Axis Transform (MAT) of a given shape. Different from previous methods that rely on local approximation error, our method allows a global consideration of the overall shape structure, leading to an efficient high-level abstraction and superior robustness to noise. Another appealing aspect of our method is its capability to handle more generalized input such as point clouds and poor-quality meshes. Extensive comparisons and evaluations demonstrate the remarkable effectiveness of our method for generating compact and expressive skeletal representation to approximate the MAT.

Analysis of SARS-CoV-2 Transmission in a University Classroom based on Real Human Close Contact Behaviors
Nan Zhang, Xueze Yang, Boni Su, Zhiyang Dou#.
Science of the Total Environment (STOTEN) 2024.
# Corresponding Author.

Popularization of High-Speed Railway Reduces the Infection Risk via Close Contact Route during Journey
Nan Zhang, Xiyue Liu, Shuyi Gao, Boni Su, Zhiyang Dou#.
Sustainable Cities and Society (SCS) 2023.
# Corresponding Author.
  • paper
  • abstract
    The risk of COVID-19 infection has increased due to the prolonged duration of travel and frequent close interactions due to popularization of railway transportations. This study utilized depth detection devices to analyze the close contact behaviors of passengers in high-speed train (HST), traditional trains (TT), waiting area in waiting room (WWR), and ticket check area in waiting room (CWR). A multi-route COVID-19 transmission model was developed to assess the risk of virus exposure in these scenarios under various non-pharmaceutical interventions. A total of 163,740 seconds of data was collected. The close contact ratios in HST, TT, WWR, and CWR was 5.8%, 64.0%, 7.7%, and 49.0%, respectively. The average interpersonal distance between passengers was 0.85 m, 0.92 m, 1.25 m, and 0.88 m, respectively. The probability of face-to-face contact was 9.5%, 70.0%, 64.2%, and 5.8% across each environment, respectively. When all passengers wore N95 respirators and surgical masks, the personal virus exposure via close contact can be reduced by 94.1% and 51.9%, respectively. The virus exposure in TT is about dozens of times of it in HST. In China, if all current railway traffic was replaced by HST, the total virus exposure of passengers can be reduced by roughly 50%. 

Analysis of SARS-CoV-2 Transmission in Airports based on Real Human Close Contact Behaviors
Xueze Yang*, Zhiyang Dou*, Yuqing Ding, Boni Su, Hua Qian, Nan Zhang.
Journal of Building Engineering (JOBE) 2023.

  • paper
  • abstract
    The COVID-19 pandemic has significantly impacted people's daily lives for over three years. Airports, with their dense population and frequent close contact, pose a higher risk of respiratory infectious diseases compared to many other indoor environments. However, limited availability of data on close contact behavior has resulted in a gap in indoor exposure analysis. This study conducted depth sensor measurements and video data collection across nine areas of a northern (airport A) and a southern (airport B) airports in China by 11 participants. The data, comprising more than 44 hours of close contact behaviors, including interpersonal distance, relative facial orientation, and the relative position of individuals, were analyzed using a semi-supervised machine learning method. Based on this analysis, a close contact transmission model for COVID-19 was developed, which considers the aforementioned close contact behaviors to assess the risk of exposure and the efficacy of interventions. The average close contact ratio in 9 airport’s areas is 25.4% (ranging from 6.1% to 55.0%), with passengers having the highest frequency of close contact in manual check-in areas. During close contacts, the average interpersonal distance in airports is 1.2 meters (ranging from 1.1 to 1.4 meters), being shortest in boarding areas. Face-to-face close contact is highest in charging areas, with a percentage of 46.9%. If people maintain a distance of over 1.0 meter in all areas, the total virus exposure could be reduced by 6.9% to 22.0% compared to the actual situation. Dining areas have the highest virus exposure risk for both short-range inhalation and mucosal deposition, followed by manual check-in areas. This study provides a data support for the scientific epidemic prevention and control in airports from the viewpoint of close contact behaviors.

Student close contact behavior and COVID-19 transmission in China’s classrooms
Yong Guo*, Zhiyang Dou*, Nan Zhang, Xiyue Liu, Boni Su, Yuguo Li, Yinping Zhang.
PNAS Nexus 2023.

This research has been featured in a press release by EurekAlert!.

  • project page
  • paper
  • press release
  • abstract
    Classrooms are high-risk indoor environments, so analysis of SARS-CoV-2 transmission in classrooms is important for determining optimal interventions. Due to the absence of human behavior data, it is challenging to accurately determine virus exposure in classrooms. A wearable device for close contact behavior detection was developed, and we recorded more than 250-thousand data points of close contact behaviors of students from Grades 1 through 12. Combined with a survey on students’ behaviors, we analyzed virus transmission in classrooms. Close contact rates for students were 37%±11% during classes and 48%±13% during breaks. Students in lower grades had higher close contact rates and virus transmission potential. The long-range airborne transmission route is dominant, accounting for 90%±3.6% and 75%±7.7% with and without mask wearing, respectively. During breaks, the short-range airborne route became more important, contributing 48%±3.1% in grades 1 to 9 (without wearing masks). Ventilation alone cannot always meet the demands of COVID-19 control, 30 m3/h/person is suggested as the threshold outdoor air ventilation rate in classroom. This study provides scientific support for COVID-19 prevention and control in classrooms, and our proposed human behavior detection and analysis methods offer a powerful tool to understand virus transmission characteristics, and can be employed in various indoor environments.

Close Contact Behaviors of University and School Students in 10 Typical Indoor Environments
Nan Zhang, Li Liu, Zhiyang Dou, Xiyue Liu, Xueze Yang, Doudou Miao, Yong Guo, Silan Gu, Yuguo Li, Hua Qian, Jianjian Wei.
Journal of Hazardous Materials (JHM) 2023.
  • paper
  • paper
  • abstract
    Close contact, including both short-range airborne and large droplet, is recognized as the main route of SARS-CoV-2 transmission in indoor environments, however exposure risk via this route is difficult to quantify due to a lack of data showing close contact behaviors of people in typical indoor environments. A digital wearable device was developed to capture human close contact behaviors automatically based on semi-supervised learning. We collected a total of 337,056 seconds of indoor close contacts from 194 and a half hours of depth video recordings in 10 typical indoor environments. The relationship between SARS-CoV-2 exposure and close contact behaviors were evaluated based on dispersion characteristics of virus-laden droplets. People in restaurant had the highest close contact ratio (63.8%) and probability of face-to-face pattern (77.6%) during close contacts, while people in shopping center had the highest speak fraction (46.6%). University students had higher exposure potential in dormitories than school students in homes, but less exposure potential in classrooms and graduate student offices than school students in classrooms. Aerosol exposure in volume for both short-range inhalation and direct deposition on facial mucosa were highest in restaurants. Classroom is the main indoor environment for SARS-CoV-2 transmission for school students. The obtained results based on real human close contact behaviors can be used for infection risk assessment and to deploy effective interventions against close contact transmission of COVID-19 and other respiratory infections.

Close Contact Behavior-based COVID-19 Transmission and Interventions in a Subway System
Xiyue Liu*, Zhiyang Dou*, Lei Wang, Boni Su, Tianyi Jin, Yong Guo, Jianjian Wei, Nan Zhang.
Journal of Hazardous Materials (JHM) 2022.
  • project page
  • paper
  • abstract
    During COVID-19 pandemic, analysis on virus exposure and intervention efficiency in public transports based on real passenger’s close contact behaviors is critical to curb infectious disease transmission. A monitoring device was developed to gather a total of 145,821 close contact data in subways based on semi-supervision learning. A virus transmission model considering both short- and long-range inhalation and deposition was established to calculate the virus exposure. During rush-hour, short-range inhalation exposure is 3.2 times higher than deposition exposure and 7.5 times higher than long-range inhalation exposure of all passengers in the subway. The close contact rate was 56.1 % and the average interpersonal distance was 0.8 m. Face-to-back was the main pattern during close contact. Comparing with random distribution, if all passengers stand facing in the same direction, personal virus exposure through inhalation (deposition) can be reduced by 74.1 % (98.5 %). If the talk rate was decreased from 20 % to 5 %, the inhalation (deposition) exposure can be reduced by 69.3 % (73.8 %). In addition, we found that virus exposure could be reduced by 82.0 % if all passengers wear surgical masks. This study provides scientific support for COVID-19 prevention and control in subways based on real human close contact behaviors.

Top-Down Shape Abstraction Based on Greedy Pole Selection
Zhiyang Dou, Shiqing Xin, Rui Xu, Jian Xu, Yuanfeng Zhou, Shuangmin Chen, Wenping Wang, Xiuyang Zhao, Changhe Tu.
IEEE Transactions on Visualization and Computer Graphics. TVCG 2020.

  • paper
  • abstract
    Motivated by the fact that the medial axis transform is able to encode nearly the complete shape, we propose to use as few medial balls as possible to approximate the original enclosed volume by the boundary surface. We progressively select new medial balls, in a top-down style, to enlarge the region spanned by the existing medial balls. The key spirit of the selection strategy is to encourage large medial balls while imposing given geometric constraints. We further propose a speedup technique based on a provable observation that the intersection of medial balls implies the adjacency of power cells (in the sense of the power crust). We further elaborate the selection rules in combination with two closely related applications. One application is to develop an easy-to-use ball-stick modeling system that helps non-professional users to quickly build a shape with only balls and wires, but any penetration between two medial balls must be suppressed. The other application is to generate porous structures with convex, compact (with a high isoperimetric quotient) and shape-aware pores where two adjacent spherical pores may have penetration as long as the mechanical rigidity can be well preserved.


Research Experience and Arrangements

IRC The University of Hong Kong Tencent AI Lab     BJUT     Tencentgame     Upenn


Services

  • Reviewer: SIGGRAPH; SIGGRAPH ASIA; TVCG; ICCV; CVPR; GM; CAD (CADJ); TIP; GMP; CGI; ICONIP; FSDM; MLIS; Scientific (BrainSTEM@HKU).

  • Teaching Assistant:

  • Invited Talks: