PCHMR: Empowering and Benchmarking Human Mesh Recovery in Privacy-Constrained Real-World Settings

A benchmark and local personalization pipeline for human mesh recovery when raw user images remain on device.

Zeyu Cao*1 Qingxuan Wu*2 Zhiyang Dou†3 Rui Xu3 Yuan Liu4 Javier Fernandez-Marques1 Nicholas D. Lane1 Taku Komura3 Wenping Wang†5

1 University of Cambridge 2 University of Oxford 3 The University of Hong Kong 4 HKUST 5 Texas A&M University

* Equal contribution. † Corresponding authors.

2025

Abstract

Fine-tuning directly on user-side data is an effective way to improve the performance of widely adopted data-driven human mesh recovery (HMR) models. However, existing HMR methods often assume that user-side images can be aggregated to a central training server, which creates a privacy risk for sensitive human data. PCHMR studies how HMR can be evaluated and improved when raw images remain local. We benchmark state-of-the-art HMR models under federated and secure-aggregation variants, covering client scale, heterogeneous data distributions, and user-side personalization. We also introduce DePoser, a depth-aware local annotation and fine-tuning pipeline that uses foundation-model pose and depth priors to support local adaptation. The benchmark clarifies how representative HMR backbones behave when centralized data access is removed, and provides a foundation for future privacy-constrained HMR deployment.

Overview

Centralized HMR versus raw-data-local HMR training.

Raw data stays local Clients keep images and mesh labels in their private domain while exchanging model-side information.
Federated HMR benchmark The benchmark covers client scale, optimizers, heterogeneous client data, and natural scene partitions.
Local personalization DePoser provides local pseudo-labels for adapting HMR models to user-side distributions.
Centralized HMR compared with privacy-constrained HMR
Privacy-constrained HMR. Centralized HMR uploads sensitive images for training. PCHMR keeps raw user data local, exchanges only training updates, and benchmarks collaborative training, heterogeneous client distributions, and local personalization.

Method

Local annotation, collaborative training, and on-device inference.

PCHMR privacy-constrained training pipeline
Privacy-constrained HMR pipeline. User-side data is labeled locally, used for privacy-constrained collaborative training, and returned as an improved model for local real-time inference.

Training Paradigms

PCHMR simulates a client-server HMR training setup with Flower. Each client trains locally for one epoch and the server aggregates model updates. Experiments cover 10, 100, and 1000 clients, with 10 clients sampled per round, as well as IID and heterogeneous partitions.

Privacy setting comparison. PCHMR treats centralized HMR as a reference baseline and evaluates raw-data-local federated training plus a secure-aggregation variant.

Paradigm What leaves the client? Server visibility Use in PCHMR
Centralized HMR Raw images and mesh labels Training data and model Reference baseline
Federated HMR Model updates Individual updates and aggregate model Main benchmark setting
Secure aggregation variant Masked model updates Aggregated update only Privacy-strengthened comparison

PCHMR reduces raw-data centralization. Vanilla federated learning does not rule out all update-level or released-model leakage; secure aggregation addresses individual-update visibility under the stated honest-but-curious server assumption.

Benchmark Results

Client scale, heterogeneous partitions, and qualitative reconstructions.

Effect of Client Scale

FastMETRO-S and TORE-S remain competitive when the training data is distributed across many simulated clients. The most difficult 1000-client setting introduces only a modest drop relative to the centralized baseline.

Client-scale evaluation. Results are reported in millimeters. Lower is better.

Clients (total / per round) Model MPJPE PA-MPJPE Raw data local?
CentralizedFM-S57.9840.62No
10 / 10FM-S56.8540.52Yes
100 / 10FM-S59.3141.48Yes
1000 / 10FM-S60.8244.53Yes
CentralizedTORE-S63.8841.99No
10 / 10TORE-S61.2741.60Yes
100 / 10TORE-S62.9243.11Yes
1000 / 10TORE-S66.0444.04Yes
FastMETRO benchmark on Human3.6M
Human3.6M. FastMETRO-S benchmark trends under privacy-constrained training settings.
FastMETRO benchmark on 3DPW
3DPW. FastMETRO-S benchmark trends, including distributed fine-tuning and natural partitions.
LDA data distribution evaluation
Heterogeneous data. LDA-controlled client partitions on Human3.6M. Smaller α values create more skewed client data.

Natural Partition on 3DPW

Natural scene partitions approximate real deployment, where each user's data is shaped by identity, clothing, environment, camera pose, and local data volume.

3DPW fine-tuning. The natural partition assigns each scene to a client.

Setting Model MPJPE PA-MPJPE MPVPE Raw data local?
CentralizedFM-S84.9254.6997.60No
PCHMRFM-S84.6654.7897.42Yes
PCHMR (Natural Part.)FM-S86.8355.6199.66Yes
CentralizedTORE-S87.9755.35101.88No
PCHMRTORE-S87.5554.08101.86Yes
PCHMR (Natural Part.)TORE-S88.0055.07102.50Yes
Qualitative HMR results on Human3.6M and 3DPW
Qualitative reconstructions. Results on Human3.6M and 3DPW, trained with 100 clients and 10 randomly sampled clients per round using FastMETRO and TORE backbones.

Local Annotation and Personalization

Depth-aware pseudo-labels for user-side adaptation.

DePoser annotates user-side images locally with pose and depth priors. The resulting pseudo-labels enable local personalization of a collaboratively trained HMR model without centralizing private images.

Local personalization on VR data
VR personalization. FastMETRO-S before and after PCHMR fine-tuning on locally annotated VR data. The pseudo-ground truth is generated by DePoser.
DePoser compared with SMPLify-X
Local annotation. DePoser improves foot placement and hand positioning compared with SMPLify-X by using Sapiens pose and depth priors.
FastMETRO before and after privacy-constrained fine-tuning
Privacy-constrained fine-tuning. Local adaptation improves reconstruction quality on in-the-wild examples by aligning the model to the local pseudo-labeled distribution.

Personalization Results

Local fine-tuning accuracy. Results are measured against DePoser-generated pseudo-labels and therefore reflect consistency with the local pseudo-ground-truth distribution.

Dataset Before / After Model MPVPE MPJPE PA-MPJPE
VR-runnerBefore PCHMRFM-S109.50114.5160.07
VR-runnerAfter PCHMRFM-S93.2358.2443.16
VR-game-1Before PCHMRFM-S176.24172.5778.21
VR-game-1After PCHMRFM-S65.5762.3043.82
VR-game-2Before PCHMRFM-S150.64161.8284.36
VR-game-2After PCHMRFM-S82.5369.4953.57
OculusBefore PCHMRFM-S103.92102.2261.62
OculusAfter PCHMRFM-S56.7657.9239.94

Local annotator comparison. DePoser improves reconstruction accuracy on a sub-sampled 3DPW set by adding a depth-aware local annotation objective.

Local annotator MPVPE MPJPE PA-MPJPE
SMPLify-X198.66181.0787.17
DePoser162.60151.1971.14

Privacy scope

PCHMR reduces raw-data centralization. Vanilla federated learning does not rule out all update-level or released-model leakage; secure aggregation addresses individual-update visibility under the stated honest-but-curious server assumption.

Citation

@article{cao2025pchmr,
  title={PCHMR: Empowering and Benchmarking Human Mesh Recovery in Privacy-Constrained Real-World Settings},
  author={Cao, Zeyu and Wu, Qingxuan and Dou, Zhiyang and Xu, Rui and Liu, Yuan and Fernandez-Marques, Javier and Lane, Nicholas D. and Komura, Taku and Wang, Wenping},
  year={2025}
}