pic MotionWavelet: Human Motion Prediction via
Wavelet Manifold Learning


Yuming Feng 1,†   Zhiyang Dou 2,3,†,‡   Ling-Hao Chen 4   Yuan Liu 5,6   Tianyu Li 7   Jingbo Wang 8   Zeyu Cao 9  
Wenping Wang 10   Taku Komura 2   Lingjie Liu 3,‡  
1ICL   2HKU   3UPenn   4THU   5HKUST   6NTU  
7Georgia Tech   8Shanghai AI Lab   9Cambridge   10TAMU  
†, ‡ denote equal contributions and corresponding authors.
Arxiv 2024.

Abstract


Modeling temporal characteristics and the non-stationary dynamics of body movement plays a significant role in predicting human future motions. However, it is challenging to capture these features due to the subtle transitions involved in the complex human motions. This paper introduces MotionWavelet, a human motion prediction framework that utilizes Wavelet Transformation and studies the human motion patterns in the spatial-frequency domain. In MotionWavelet, a Wavelet Diffusion Model (WDM) learns a Wavelet Manifold by applying Wavelet Transformation on the motion data therefore encoding the intricate spatial and temporal motion patterns. Once the Wavelet Manifold is built, WDM trains a diffusion model to generate human motions from Wavelet latent vectors. In addition to the WDM, MotionWavelet also presents a Wavelet Space Shaping Guidance mechanism to refine the denoising process to improve conformity with the manifold structure. WDM also develops Temporal Attention-Based Guidance to enhance the prediction accuracy. Extensive experiments validate the effectiveness of MotionWavelet, demonstrating improved prediction accuracy and enhanced generalization across various benchmarks. Our code and models will be released upon acceptance.


Qualitative comparisons. The upper part shows predictions for Human3.6M, and the bottom part for HumanEva-I. The first row in each part represents ground truth motion. The closer to the ground truth motion indicates better prediction.



Quantitative comparison between our approach and state-of-the-art methods on the HumanEva-I and Human3.6M datasets. Our method consistently demonstrates superior accuracy while maintaining commendable diversity metrics. Bold values indicate the best performance, while underlined values indicate the second best.


Framework


System overview. Our method first converts motion from spatial space to Wavelet manifold and then conducts Wavelet Manifold Diffusion given few history frames where a denoiser $\epsilon_\theta$ is trained from the diffusion process $q (\mathbf{y}^{(t)}| \mathbf{y}^{(t-1)})$. During inference, the Wavelet Manifold Diffusion model predicts the latent $\mathbf{y}^{(0)}$ from condition inputs and then uses iDWT to transform it to the motion space efficiently.


More Qualitative Results


More qualitative results of MotionWavelet, where the green-purple skeletons represent the observed motions, and the red-black skeletons represent the predicted motions. We visualize 10 predicted samples. Our method produces high-fidelity and diverse motion prediction results.



More qualitative results of MotionWavelet, where the green-purple skeletons represent the observed motions, the blue-purple skeletons represent the GT motions, and the red-black skeletons represent the predicted motions. We visualize 10 predicted samples without overlay.


Controllable Human Motion Prediction


Joint-level Control


Visualizations showcasing the joint-level control motion prediction results of MotionWavelet. The green-purple skeletons represent the observed joint motions, while the red-black skeletons represent the predicted joint motions. The controlled joints are highlighted in yellow for clarity.


Motion Switch Control


Controllable Motion Prediction: Motion Switching. Visualizations showcasing the motion transfer results of MotionWavelet. The green-purple skeletons represent the observed motions, the red-black skeletons represent the predicted motions, and the blue-yellow skeletons represent the target motions

Controllable Motion Prediction: Motion Switching. Visualizations showcasing the motion transfer results of MotionWavelet. The green-purple skeletons represent the observed motions, the red-black skeletons represent the predicted motions, and the blue-yellow skeletons represent the target motions



Check out our paper for more details.

Citation

@article{feng2024motionwavelet,
  title={MotionWavelet: Human Motion Prediction via Wavelet Manifold Learning},
  author={Yuming Feng*, Zhiyang Dou*, Ling-Hao Chen, Yuan Liu, Tianyu Li, Jingbo Wang, Zeyu Cao, Wenping Wang, Taku Komura, Lingjie Liu},
  journal={arXiv preprint},
  year={2024}
}

This page is Zotero translator friendly. Page last updated Imprint. Data Protection.