Publications
* Equal contribution; # Corresponding Authors; cs: coming soon.
* Equal contribution; # Corresponding Authors; cs: coming soon.
Reproducing realistic collective behaviors presents a captivating yet formidable challenge. Traditional rule-based methods rely on hand-crafted principles, limiting motion diversity and realism in generated collective behaviors. Recent imitation learning methods learn from data but often require ground truth motion trajectories and struggle with authenticity, especially in high-density groups with erratic movements. In this paper, we present a scalable approach, Collective Behavior Imitation Learning (CBIL), for learning fish schooling behavior directly from videos, without relying on captured motion trajectories. Our method first leverages Video Representation Learning, where a Masked Video AutoEncoder (MVAE) extracts implicit states from video inputs in a self-supervised manner. The MVAE effectively maps 2D observations to implicit states that are compact and expressive for following the imitation learning stage. Then, we propose a novel adversarial imitation learning method to effectively capture complex movements of the schools of fish, allowing for efficient imitation of the distribution for motion patterns measured in the latent space. It also incorporates bio-inspired rewards alongside priors to regularize and stabilize training. Once trained, CBIL can be used for various animation tasks with the learned collective motion priors. We further show its effectiveness across different species. Finally, we demonstrate the application of our system in detecting abnormal fish behavior from in-the-wild videos.
In this paper, we propose ProTracker, a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. The key idea of our method is incorporating probabilistic integration to refine multiple predictions from both optical flow and semantic features for robust short-term and long-term tracking. Specifically, we integrate optical flow estimations in a probabilistic manner, producing smooth and accurate trajectories by maximizing the likelihood of each prediction. To effectively re-localize challenging points that disappear and reappear due to occlusion, we further incorporate long-term feature correspondence into our flow predictions for continuous trajectory generation. Extensive experiments show that ProTracker achieves the state-of-the-art performance among unsupervised and self-supervised approaches, and even outperforms supervised methods on several benchmarks. Our code and model will be publicly available upon publication.
Modeling temporal characteristics and the non-stationary dynamics of body movement plays a significant role in predicting human future motions. However, it is challenging to capture these features due to the subtle transitions involved in the complex human motions. This paper introduces MotionWavelet, a human motion prediction framework that utilizes Wavelet Transformation and studies human motion patterns in the spatial-frequency domain. In MotionWavelet, a Wavelet Diffusion Model (WDM) learns a Wavelet Manifold by applying Wavelet Transformation on the motion data therefore encoding the intricate spatial and temporal motion patterns. Once the Wavelet Manifold is built, WDM trains a diffusion model to generate human motions from Wavelet latent vectors. In addition to the WDM, MotionWavelet also presents a Wavelet Space Shaping Guidance mechanism to refine the denoising process to improve conformity with the manifold structure. WDM also develops Temporal Attention-Based Guidance to enhance prediction accuracy. Extensive experiments validate the effectiveness of MotionWavelet, demonstrating improved prediction accuracy and enhanced generalization across various benchmarks. Our code and models will be released upon acceptance.
Recent developments in monocular depth estimation methods enable high-quality depth estimation of single-view images but fail to estimate consistent video depth across different frames. Recent works address this problem by applying a video diffusion model to generate video depth conditioned on the input video, which is training-expensive and can only produce scale-invariant depth values without camera poses. In this paper, we propose a novel video-depth estimation method called Align3R to estimate temporal consistent depth maps for a dynamic video. Our key idea is to utilize the recent DUSt3R model to align estimated monocular depth maps of different timesteps. First, we fine-tune the DUSt3R model with additional estimated monocular depth as inputs for the dynamic scenes. Then, we apply optimization to reconstruct both depth maps and camera poses. Extensive experiments demonstrate that Align3R estimates consistent video depth and camera poses for a monocular video with superior performance than baseline methods.
Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication.
In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process. This alleviates the need for numerous manual annotations or the high computational costs associated with model-induced annotation approaches. We experimentally validate that the confidence variations learned by the verification model trained on the final answer correctness can effectively identify errors in the reasoning steps. Subsequently, we demonstrate that the process annotations generated by \textsc{AutoCV} can improve the accuracy of the verification model in selecting the correct answer from multiple outputs generated by LLMs. Notably, we achieve substantial improvements across five datasets in mathematics and commonsense reasoning. The source code of \textsc{AutoCV} is available at \url{this https URL}.
In this paper, we present Surf-D, a novel method for generating high-quality 3D shapes as Surface with arbitrary topologies using Diffusion models. Specifically, we adopt Unsigned Distance Field (UDF) as the surface representation, as it excels in handling arbitrary topologies, enabling the generation of complex shapes. While the prior methods explored shape generation with different representations, they suffer from limited topologies and geometry details. Moreover, it's non-trivial to directly extend prior diffusion models to UDF because they lack spatial continuity due to the discrete volume structure. However, UDF requires accurate gradients for mesh extraction and learning. To tackle the issues, we first leverage a point-based auto-encoder to learn a compact latent space, which supports gradient querying for any input point through differentiation to effectively capture intricate geometry at a high resolution. Since the learning difficulty for various shapes can differ, a curriculum learning strategy is employed to efficiently embed various surfaces, enhancing the whole embedding process. With pretrained shape latent space, we employ a latent diffusion model to acquire the distribution of various shapes. Our approach demonstrates superior performance in shape generation across multiple modalities and conducts extensive experiments in unconditional generation, category conditional generation, 3D reconstruction from images, and text-to-shape tasks. Our code will be publicly available upon paper publication.
We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Although previous motion diffusion models have shown impressive results, they struggle to achieve fast generation while maintaining high-quality human motions. Motion latent diffusion has been proposed for efficient motion generation. However, effectively learning a latent space can be non-trivial in such a two-stage manner. Meanwhile, accelerating motion sampling by increasing the step size, e.g., DDIM, typically leads to a decline in motion quality due to the inapproximation of complex data distributions when naively increasing the step size. In this paper, we propose EMDM that allows for much fewer sample steps for fast motion generation by modeling the complex denoising distribution during multiple sampling steps. Specifically, we develop a Conditional Denoising Diffusion GAN to capture multimodal data distributions conditioned on both control signals, i.e., textual description and denoising time step. By modeling the complex data distribution, a larger sampling step size and fewer steps are achieved during motion synthesis, significantly accelerating the generation process. To effectively capture the human dynamics and reduce undesired artifacts, we employ motion geometric loss during network training, which improves the motion quality and training efficiency. As a result, EMDM achieves a remarkable speed-up at the generation stage while maintaining high-quality motion generation in terms of fidelity and diversity.
Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion synthesis, incorporating both low-level trajectory and high-level language semantics controls. Specifically, we first train a VQ-VAE to learn a compact latent motion space organized by body parts. We then propose a Masked Trajectories Transformer to make coarse initial predictions of full trajectories of joints based on the learned latent motion space, with user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce an efficient test-time optimization to refine these coarse predictions for accurate trajectory control. Experiments demonstrate that TLControl outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.
In this paper, we introduced a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have yielded diverse human avatars from text prompts, these methods typically combine all elements—clothes, hair, and body—into a single 3D representation. Such an entangled approach poses challenges for downstream tasks like editing or animation. To overcome these limitations, we propose a novel disentangled 3D avatar representation named Sequentially Offset-SMPL (SO-SMPL), building upon the SMPL model. SO-SMPL represents the human body and clothes with two separate meshes, but associates them with offsets to ensure the physical alignment between the body and the clothes. Then, we design an Score Distillation Sampling(SDS)-based distillation framework to generate the proposed SO-SMPL representation from text prompts. In comparison with existing text-to-avatar methods, our approach not only achieves higher exture and geometry quality and better semantic alignment with text prompts, but also significantly improves the visual quality of character animation, virtual try-on, and avatar editing.
3D single object tracking (SOT) is an essential task in autonomous driving and robotics. However, learning robust 3D SOT trackers remains challenging due to the limited category-specific point cloud data and the inherent sparsity and incompleteness of LiDAR scans. To tackle these issues, we propose a unified 3D SOT framework that leverages 3D generative pre-training and learns robust 3D matching abilities from 2D pre-trained foundation trackers. Our framework features a consistent target-matching architecture with the widely used 2D trackers, facilitating the transfer of 2D matching knowledge. Specifically, we first propose a lightweight Target-Aware Projection (TAP) module, allowing the pre-trained 2D tracker to work well on the projected point clouds without further fine-tuning. We then propose a novel IoU-guided matching-distillation framework that utilizes the powerful 2D pre-trained trackers to guide 3D matching learning in the 3D tracker, i.e., the 3D template-to-search matching should be consistent with its corresponding 2D template-to-search matching obtained from 2D pre-trained trackers. Our designs are applied to two mainstream 3D SOT frameworks: memory-less Siamese and contextual memory-based approaches, which are respectively named SiamDisst and MemDisst. Extensive experiments show that SiamDisst and MemDisst achieve state-of-the-art performance on KITTI, Waymo Open Dataset and nuScenes benchmarks, while running at the above real-time speed of 25 and 90 FPS on a single RTX3090 GPU. The code will be made publicly available.
We introduce Coverage Axis++, a novel and efficient approach to 3D shape skeletonization. The current state-of-the-art approaches for this task often rely on the watertightness of the input or suffer from substantial computational costs, thereby limiting their practicality. To address this challenge, Coverage Axis++ proposes a heuristic algorithm to select skeletal points, offering a high-accuracy approximation of the Medial Axis Transform (MAT) while significantly mitigating computational intensity for various shape representations. We introduce a simple yet effective strategy that considers both shape coverage and uniformity to derive skeletal points. The selection procedure enforces consistency with the shape structure while favoring the dominant medial balls, which thus introduces a compact underlying shape representation in terms of MAT. As a result, Coverage Axis++ allows for skeletonization for various shape representations (e.g., water-tight meshes, triangle soups, point clouds), specification of the number of skeletal points, few hyperparameters, and highly efficient computation with improved reconstruction accuracy. Extensive experiments across a wide range of 3D shapes validate the efficiency and effectiveness of Coverage Axis++. The code will be publicly available once the paper is published.
Recently, the emergence of diffusion models has opened up new opportunities for single-view reconstruction. However, all the existing methods represent the target object as a closed mesh devoid of any structural information, thus neglecting the part-based structure, which is crucial for many downstream applications, of the reconstructed shape. Moreover, the generated meshes usually suffer from large noises, unsmooth surfaces, and blurry textures, making it challenging to obtain satisfactory part segments using 3D segmentation techniques. In this paper, we present Part123, a novel framework for part-aware 3D reconstruction from a single-view image. We first use diffusion models to generate multiview-consistent images from a given image, and then leverage Segment Anything Model (SAM), which demonstrates powerful generalization ability on arbitrary objects, to generate multiview segmentation masks. To effectively incorporate 2D part-based information into 3D reconstruction and handle inconsistency, we introduce contrastive learning into a neural rendering framework to learn a part-aware feature space based on the multiview segmentation masks. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models. Experiments show that our method can generate 3D models with high-quality segmented parts on various objects. Compared to existing unstructured reconstruction methods, the part-aware 3D models from our method benefit some important applications, including feature-preserving reconstruction, primitive fitting, and 3D shape editing.
In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.
We present C·ASE, an efficient and effective framework that learns Conditional Adversarial Skill Embeddings for physics-based characters. C·ASE enables the physically simulated character to learn a diverse repertoire of skills while providing controllability in the form of direct manipulation of the skills to be performed. This is achieved by dividing the heterogeneous skill motions into distinct subsets containing homogeneous samples for training a low-level conditional model to learn the conditional behavior distribution. The skill-conditioned imitation learning naturally offers explicit control over the character’s skills after training. The training course incorporates the focal skill sampling, skeletal residual forces, and element-wise feature masking to balance diverse skills of varying complexities, mitigate dynamics mismatch to master agile motions and capture more general behavior characteristics, respectively. Once trained, the conditional model can produce highly diverse and realistic skills, outperforming state-of-the-art models, and can be repurposed in various downstream tasks. In particular, the explicit skill control handle allows a high-level policy or a user to direct the character with desired skill specifications, which we demonstrate is advantageous for interactive character animation.
In this paper, we introduce a set of effective TOken REduction (TORE) strategies for Transformer-based Human Mesh Recovery from monocular images. Current SOTA performance is achieved by Transformer-based structures. However, they suffer from high model complexity and computation cost caused by redundant tokens. We propose token reduction strategies based on two important aspects, i.e., the 3D geometry structure and 2D image feature, where we hierarchically recover the mesh geometry with priors from body structure and conduct token clustering to pass fewer but more discriminative image feature tokens to the Transformer. As a result, our method vastly reduces the number of tokens involved in high-complexity interactions in the Transformer, achieving competitive accuracy of shape recovery at a significantly reduced computational cost. We conduct extensive experiments across a wide range of benchmarks to validate the proposed method and further demonstrate the generalizability of our method on hand mesh recovery. Our code will be publicly available once the paper is published.
SIGGRAPH 2023 Best Paper Award; See more here.
Estimating normals with globally consistent orientations for a raw point cloud has many downstream geometry processing applications. Despite tremendous efforts in the past decades, it remains challenging to deal with an unoriented point cloud with various imperfections, particularly in the presence of data sparsity coupled with nearby gaps or thin-walled structures. In this paper, we propose a smooth objective function to characterize the requirements of an acceptable winding-number field, which allows one to find the globally consistent normal orientations starting from a set of completely random normals. By taking the vertices of the Voronoi diagram of the point cloud as examination points, we consider the following three requirements: (1) the winding number is either 0 or 1, (2) the occurrences of 1 and the occurrences of 0 are balanced around the point cloud, and (3) the normals align with the outside Voronoi poles as much as possible. Extensive experimental results show that our method outperforms the existing approaches, especially in handling sparse and noisy point clouds, as well as shapes with complex geometry/topology.
Feature lines are important geometric cues in characterizing the structure of a CAD model. Despite great progress in both explicit reconstruction and implicit reconstruction, it remains a challenging task to reconstruct a polygonal surface equipped with feature lines, especially when the input point cloud is noisy and lacks faithful normal vectors. In this paper, we develop a multistage algorithm, named RFEPS, to address this challenge. The key steps include (1)denoising the point cloud based on the assumption of local planarity, (2)identifying the feature-line zone by optimization of discrete optimal transport, (3)augmenting the point set so that sufficiently many additional points are generated on potential geometry edges, and (4) generating a polygonal surface that interpolates the augmented point set based on restricted power diagram. We demonstrate through extensive experiments that RFEPS, benefiting from the edge-point augmentation and the feature-preserving explicit reconstruction, outperforms state-of-the-art methods in terms of the reconstruction quality, especially in terms of the ability to reconstruct missing feature lines.
Top Cited Article in CGF 2022-2023. [Link]
Fast-Forward Attendees Award at EG22, 2nd Place.
In this paper, we present a simple yet effective formulation called Coverage Axis for 3D shape skeletonization. Inspired by the set cover problem, our key idea is to cover all the surface points using as few inside medial balls as possible. This formulation inherently induces a compact and expressive approximation of the Medial Axis Transform (MAT) of a given shape. Different from previous methods that rely on local approximation error, our method allows a global consideration of the overall shape structure, leading to an efficient high-level abstraction and superior robustness to noise. Another appealing aspect of our method is its capability to handle more generalized input such as point clouds and poor-quality meshes. Extensive comparisons and evaluations demonstrate the remarkable effectiveness of our method for generating compact and expressive skeletal representation to approximate the MAT.
Outbreaks of respiratory infectious diseases have often been reported in fitness centers, likely attributed to high population density, extensive shared surfaces, and elevated metabolic equivalent (MET) levels. This study analyzed the behaviors of 30 gym attendees to establish a connection between exercise intensity and virus exposure. Close interactions among participants were tracked using self-developed wearable devices that utilized computer vision technologies, while surface-contact behaviors were recorded using video cameras. A multi-route transmission model for respiratory infectious diseases was subsequently created, integrating the observed behaviors. The Omicron variant of COVID-19 served as a case study to evaluate infection risk via various transmission routes and to assess the efficacy of interventions. The METs during physical activity were about 3.5 times higher than those recorded at rest. The average interpersonal distance during close interactions in the gym was measured at 0.82 m, with 36.7% of interactions occurring face-to-face. On average, the participants made contact with surfaces 770.3 times per hour, with 517.5 of these contacts involving public surfaces. The hourly infection rate was calculated at 18.5%, with long-range airborne transmission and close contact accounting for 70.1% and 28.5% of the cases, respectively. To mitigate transmission risk, several intervention scenarios were modeled. These included (1) 100% mask-wearing with N95 masks and occupancy reduced to 62% (25 m²/person); (2) 100% mask-wearing with surgical masks and occupancy reduced to 26% (59.6 m²/person); (3) no mask-wearing, with occupancy reduced to 18% (86.1 m²/person). All scenarios fulfilled the criteria for achieving an R below 1, indicating that under these conditions, gyms could be reopened safely.
Background: Dental outpatient departments, characterized by close proximity and unmasked patients, present a considerable risk of respiratory infections for health care workers (HCWs). However, the lack of comprehensive data on close contact (< 1.5 m) between HCWs and patients poses a significant obstacle to the development of targeted control strategies. Methods: An observation study was conducted at a hospital in Shenzhen, China, utilizing depth cameras with machine learning to capture close-contact behaviors of patients with HCWs. Additionally, questionnaires were administered to collect patient demographics. Results: The study included 200 patients, 10 dental practitioners, and 10 nurses. Patients had significantly higher close-contact rates with dental practitioners (97.5%) compared with nurses (72.8%, P < .001). The reason for the visit significantly influenced patient-practitioner (P = .018) and patient-nurse (P = .007) close-contact time, with the highest values observed in prosthodontics and orthodontics patients. Furthermore, patient age also significantly impacted the close-contact rate with nurses (P = .024), with the highest rate observed in patients below 14 years old at 85% [interquartile range: 70-93]. Conclusions:Dental outpatient departments exhibit high HCW-patient close-contact rates, influenced by visit purpose and patient age. Enhanced infection control measures are warranted, particularly for prosthodontics and orthodontics patients or those below 14 years old.
The risk of COVID-19 infection has increased due to the prolonged duration of travel and frequent close interactions due to popularization of railway transportations. This study utilized depth detection devices to analyze the close contact behaviors of passengers in high-speed train (HST), traditional trains (TT), waiting area in waiting room (WWR), and ticket check area in waiting room (CWR). A multi-route COVID-19 transmission model was developed to assess the risk of virus exposure in these scenarios under various non-pharmaceutical interventions. A total of 163,740 seconds of data was collected. The close contact ratios in HST, TT, WWR, and CWR was 5.8%, 64.0%, 7.7%, and 49.0%, respectively. The average interpersonal distance between passengers was 0.85 m, 0.92 m, 1.25 m, and 0.88 m, respectively. The probability of face-to-face contact was 9.5%, 70.0%, 64.2%, and 5.8% across each environment, respectively. When all passengers wore N95 respirators and surgical masks, the personal virus exposure via close contact can be reduced by 94.1% and 51.9%, respectively. The virus exposure in TT is about dozens of times of it in HST. In China, if all current railway traffic was replaced by HST, the total virus exposure of passengers can be reduced by roughly 50%.
The COVID-19 pandemic has significantly impacted people's daily lives for over three years. Airports, with their dense population and frequent close contact, pose a higher risk of respiratory infectious diseases compared to many other indoor environments. However, limited availability of data on close contact behavior has resulted in a gap in indoor exposure analysis. This study conducted depth sensor measurements and video data collection across nine areas of a northern (airport A) and a southern (airport B) airports in China by 11 participants. The data, comprising more than 44 hours of close contact behaviors, including interpersonal distance, relative facial orientation, and the relative position of individuals, were analyzed using a semi-supervised machine learning method. Based on this analysis, a close contact transmission model for COVID-19 was developed, which considers the aforementioned close contact behaviors to assess the risk of exposure and the efficacy of interventions. The average close contact ratio in 9 airport’s areas is 25.4% (ranging from 6.1% to 55.0%), with passengers having the highest frequency of close contact in manual check-in areas. During close contacts, the average interpersonal distance in airports is 1.2 meters (ranging from 1.1 to 1.4 meters), being shortest in boarding areas. Face-to-face close contact is highest in charging areas, with a percentage of 46.9%. If people maintain a distance of over 1.0 meter in all areas, the total virus exposure could be reduced by 6.9% to 22.0% compared to the actual situation. Dining areas have the highest virus exposure risk for both short-range inhalation and mucosal deposition, followed by manual check-in areas. This study provides a data support for the scientific epidemic prevention and control in airports from the viewpoint of close contact behaviors.
This research has been featured in a press release by EurekAlert!
Classrooms are high-risk indoor environments, so analysis of SARS-CoV-2 transmission in classrooms is important for determining optimal interventions. Due to the absence of human behavior data, it is challenging to accurately determine virus exposure in classrooms. A wearable device for close contact behavior detection was developed, and we recorded more than 250-thousand data points of close contact behaviors of students from Grades 1 through 12. Combined with a survey on students’ behaviors, we analyzed virus transmission in classrooms. Close contact rates for students were 37%±11% during classes and 48%±13% during breaks. Students in lower grades had higher close contact rates and virus transmission potential. The long-range airborne transmission route is dominant, accounting for 90%±3.6% and 75%±7.7% with and without mask wearing, respectively. During breaks, the short-range airborne route became more important, contributing 48%±3.1% in grades 1 to 9 (without wearing masks). Ventilation alone cannot always meet the demands of COVID-19 control, 30 m3/h/person is suggested as the threshold outdoor air ventilation rate in classroom. This study provides scientific support for COVID-19 prevention and control in classrooms, and our proposed human behavior detection and analysis methods offer a powerful tool to understand virus transmission characteristics, and can be employed in various indoor environments.
Close contact, including both short-range airborne and large droplet, is recognized as the main route of SARS-CoV-2 transmission in indoor environments, however exposure risk via this route is difficult to quantify due to a lack of data showing close contact behaviors of people in typical indoor environments. A digital wearable device was developed to capture human close contact behaviors automatically based on semi-supervised learning. We collected a total of 337,056 seconds of indoor close contacts from 194 and a half hours of depth video recordings in 10 typical indoor environments. The relationship between SARS-CoV-2 exposure and close contact behaviors were evaluated based on dispersion characteristics of virus-laden droplets. People in restaurant had the highest close contact ratio (63.8%) and probability of face-to-face pattern (77.6%) during close contacts, while people in shopping center had the highest speak fraction (46.6%). University students had higher exposure potential in dormitories than school students in homes, but less exposure potential in classrooms and graduate student offices than school students in classrooms. Aerosol exposure in volume for both short-range inhalation and direct deposition on facial mucosa were highest in restaurants. Classroom is the main indoor environment for SARS-CoV-2 transmission for school students. The obtained results based on real human close contact behaviors can be used for infection risk assessment and to deploy effective interventions against close contact transmission of COVID-19 and other respiratory infections.
During COVID-19 pandemic, analysis on virus exposure and intervention efficiency in public transports based on real passenger’s close contact behaviors is critical to curb infectious disease transmission. A monitoring device was developed to gather a total of 145,821 close contact data in subways based on semi-supervision learning. A virus transmission model considering both short- and long-range inhalation and deposition was established to calculate the virus exposure. During rush-hour, short-range inhalation exposure is 3.2 times higher than deposition exposure and 7.5 times higher than long-range inhalation exposure of all passengers in the subway. The close contact rate was 56.1 % and the average interpersonal distance was 0.8 m. Face-to-back was the main pattern during close contact. Comparing with random distribution, if all passengers stand facing in the same direction, personal virus exposure through inhalation (deposition) can be reduced by 74.1 % (98.5 %). If the talk rate was decreased from 20 % to 5 %, the inhalation (deposition) exposure can be reduced by 69.3 % (73.8 %). In addition, we found that virus exposure could be reduced by 82.0 % if all passengers wear surgical masks. This study provides scientific support for COVID-19 prevention and control in subways based on real human close contact behaviors.
Motivated by the fact that the medial axis transform is able to encode nearly the complete shape, we propose to use as few medial balls as possible to approximate the original enclosed volume by the boundary surface. We progressively select new medial balls, in a top-down style, to enlarge the region spanned by the existing medial balls. The key spirit of the selection strategy is to encourage large medial balls while imposing given geometric constraints. We further propose a speedup technique based on a provable observation that the intersection of medial balls implies the adjacency of power cells (in the sense of the power crust). We further elaborate the selection rules in combination with two closely related applications. One application is to develop an easy-to-use ball-stick modeling system that helps non-professional users to quickly build a shape with only balls and wires, but any penetration between two medial balls must be suppressed. The other application is to generate porous structures with convex, compact (with a high isoperimetric quotient) and shape-aware pores where two adjacent spherical pores may have penetration as long as the mechanical rigidity can be well preserved.