PCHMR: Empowering and Benchmarking Human Mesh Recovery in Privacy-Constrained Real-World Settings
A benchmark and local personalization pipeline for human mesh recovery when raw user images remain on device.
1 University of Cambridge 2 University of Oxford 3 The University of Hong Kong 4 HKUST 5 Texas A&M University
* Equal contribution. † Corresponding authors.
2025
Abstract Overview Method Benchmark Results Local Personalization
Abstract
Fine-tuning directly on user-side data is an effective way to improve the performance of widely adopted data-driven human mesh recovery (HMR) models. However, existing HMR methods often assume that user-side images can be aggregated to a central training server, which creates a privacy risk for sensitive human data. PCHMR studies how HMR can be evaluated and improved when raw images remain local. We benchmark state-of-the-art HMR models under federated and secure-aggregation variants, covering client scale, heterogeneous data distributions, and user-side personalization. We also introduce DePoser, a depth-aware local annotation and fine-tuning pipeline that uses foundation-model pose and depth priors to support local adaptation. The benchmark clarifies how representative HMR backbones behave when centralized data access is removed, and provides a foundation for future privacy-constrained HMR deployment.
Overview
Centralized HMR versus raw-data-local HMR training.
Method
Local annotation, collaborative training, and on-device inference.
Training Paradigms
PCHMR simulates a client-server HMR training setup with Flower. Each client trains locally for one epoch and the server aggregates model updates. Experiments cover 10, 100, and 1000 clients, with 10 clients sampled per round, as well as IID and heterogeneous partitions.
Privacy setting comparison. PCHMR treats centralized HMR as a reference baseline and evaluates raw-data-local federated training plus a secure-aggregation variant.
| Paradigm | What leaves the client? | Server visibility | Use in PCHMR |
|---|---|---|---|
| Centralized HMR | Raw images and mesh labels | Training data and model | Reference baseline |
| Federated HMR | Model updates | Individual updates and aggregate model | Main benchmark setting |
| Secure aggregation variant | Masked model updates | Aggregated update only | Privacy-strengthened comparison |
PCHMR reduces raw-data centralization. Vanilla federated learning does not rule out all update-level or released-model leakage; secure aggregation addresses individual-update visibility under the stated honest-but-curious server assumption.
Benchmark Results
Client scale, heterogeneous partitions, and qualitative reconstructions.
Effect of Client Scale
FastMETRO-S and TORE-S remain competitive when the training data is distributed across many simulated clients. The most difficult 1000-client setting introduces only a modest drop relative to the centralized baseline.
Client-scale evaluation. Results are reported in millimeters. Lower is better.
| Clients (total / per round) | Model | MPJPE | PA-MPJPE | Raw data local? |
|---|---|---|---|---|
| Centralized | FM-S | 57.98 | 40.62 | No |
| 10 / 10 | FM-S | 56.85 | 40.52 | Yes |
| 100 / 10 | FM-S | 59.31 | 41.48 | Yes |
| 1000 / 10 | FM-S | 60.82 | 44.53 | Yes |
| Centralized | TORE-S | 63.88 | 41.99 | No |
| 10 / 10 | TORE-S | 61.27 | 41.60 | Yes |
| 100 / 10 | TORE-S | 62.92 | 43.11 | Yes |
| 1000 / 10 | TORE-S | 66.04 | 44.04 | Yes |
Natural Partition on 3DPW
Natural scene partitions approximate real deployment, where each user's data is shaped by identity, clothing, environment, camera pose, and local data volume.
3DPW fine-tuning. The natural partition assigns each scene to a client.
| Setting | Model | MPJPE | PA-MPJPE | MPVPE | Raw data local? |
|---|---|---|---|---|---|
| Centralized | FM-S | 84.92 | 54.69 | 97.60 | No |
| PCHMR | FM-S | 84.66 | 54.78 | 97.42 | Yes |
| PCHMR (Natural Part.) | FM-S | 86.83 | 55.61 | 99.66 | Yes |
| Centralized | TORE-S | 87.97 | 55.35 | 101.88 | No |
| PCHMR | TORE-S | 87.55 | 54.08 | 101.86 | Yes |
| PCHMR (Natural Part.) | TORE-S | 88.00 | 55.07 | 102.50 | Yes |
Local Annotation and Personalization
Depth-aware pseudo-labels for user-side adaptation.
DePoser annotates user-side images locally with pose and depth priors. The resulting pseudo-labels enable local personalization of a collaboratively trained HMR model without centralizing private images.
Personalization Results
Local fine-tuning accuracy. Results are measured against DePoser-generated pseudo-labels and therefore reflect consistency with the local pseudo-ground-truth distribution.
| Dataset | Before / After | Model | MPVPE | MPJPE | PA-MPJPE |
|---|---|---|---|---|---|
| VR-runner | Before PCHMR | FM-S | 109.50 | 114.51 | 60.07 |
| VR-runner | After PCHMR | FM-S | 93.23 | 58.24 | 43.16 |
| VR-game-1 | Before PCHMR | FM-S | 176.24 | 172.57 | 78.21 |
| VR-game-1 | After PCHMR | FM-S | 65.57 | 62.30 | 43.82 |
| VR-game-2 | Before PCHMR | FM-S | 150.64 | 161.82 | 84.36 |
| VR-game-2 | After PCHMR | FM-S | 82.53 | 69.49 | 53.57 |
| Oculus | Before PCHMR | FM-S | 103.92 | 102.22 | 61.62 |
| Oculus | After PCHMR | FM-S | 56.76 | 57.92 | 39.94 |
Local annotator comparison. DePoser improves reconstruction accuracy on a sub-sampled 3DPW set by adding a depth-aware local annotation objective.
| Local annotator | MPVPE | MPJPE | PA-MPJPE |
|---|---|---|---|
| SMPLify-X | 198.66 | 181.07 | 87.17 |
| DePoser | 162.60 | 151.19 | 71.14 |
Privacy scope
PCHMR reduces raw-data centralization. Vanilla federated learning does not rule out all update-level or released-model leakage; secure aggregation addresses individual-update visibility under the stated honest-but-curious server assumption.
Citation
@article{cao2025pchmr,
title={PCHMR: Empowering and Benchmarking Human Mesh Recovery in Privacy-Constrained Real-World Settings},
author={Cao, Zeyu and Wu, Qingxuan and Dou, Zhiyang and Xu, Rui and Liu, Yuan and Fernandez-Marques, Javier and Lane, Nicholas D. and Komura, Taku and Wang, Wenping},
year={2025}
}