PCHMR: Empowering and Benchmarking Human Mesh Recovery in Privacy-Constrained Real-World Settings

Cao, Zeyu; Wu, Qingxuan; Dou, Zhiyang; Xu, Rui; Liu, Yuan; Fernandez-Marques, Javier; Lane, Nicholas D.; Komura, Taku; Wang, Wenping

PCHMR: Empowering and Benchmarking Human Mesh Recovery in Privacy-Constrained Real-World Settings

A benchmark and local personalization pipeline for human mesh recovery when raw user images remain on device.

Zeyu Cao^*1 Qingxuan Wu^*2 Zhiyang Dou^†3 Rui Xu³ Yuan Liu⁴ Javier Fernandez-Marques¹ Nicholas D. Lane¹ Taku Komura³ Wenping Wang^†5

¹ University of Cambridge ² University of Oxford ³ The University of Hong Kong ⁴ HKUST ⁵ Texas A&M University

* Equal contribution. † Corresponding authors.

2025

Abstract Overview Method Benchmark Results Local Personalization

Abstract

Fine-tuning directly on user-side data is an effective way to improve the performance of widely adopted data-driven human mesh recovery (HMR) models. However, existing HMR methods often assume that user-side images can be aggregated to a central training server, which creates a privacy risk for sensitive human data. PCHMR studies how HMR can be evaluated and improved when raw images remain local. We benchmark state-of-the-art HMR models under federated and secure-aggregation variants, covering client scale, heterogeneous data distributions, and user-side personalization. We also introduce DePoser, a depth-aware local annotation and fine-tuning pipeline that uses foundation-model pose and depth priors to support local adaptation. The benchmark clarifies how representative HMR backbones behave when centralized data access is removed, and provides a foundation for future privacy-constrained HMR deployment.

Overview

Centralized HMR versus raw-data-local HMR training.

        
          Raw data stays local
          Clients keep images and mesh labels in their private domain while exchanging model-side information.
        
          Federated HMR benchmark
          The benchmark covers client scale, optimizers, heterogeneous client data, and natural scene partitions.
        
          Local personalization
          DePoser provides local pseudo-labels for adapting HMR models to user-side distributions.

Centralized HMR compared with privacy-constrained HMR — **Privacy-constrained HMR.** Centralized HMR uploads sensitive images for training. PCHMR keeps raw user data local, exchanges only training updates, and benchmarks collaborative training, heterogeneous client distributions, and local personalization.

Method

Local annotation, collaborative training, and on-device inference.

PCHMR privacy-constrained training pipeline — **Privacy-constrained HMR pipeline.** User-side data is labeled locally, used for privacy-constrained collaborative training, and returned as an improved model for local real-time inference.

Training Paradigms

PCHMR simulates a client-server HMR training setup with Flower. Each client trains locally for one epoch and the server aggregates model updates. Experiments cover 10, 100, and 1000 clients, with 10 clients sampled per round, as well as IID and heterogeneous partitions.

Privacy setting comparison. PCHMR treats centralized HMR as a reference baseline and evaluates raw-data-local federated training plus a secure-aggregation variant.

Paradigm	What leaves the client?	Server visibility	Use in PCHMR
Centralized HMR	Raw images and mesh labels	Training data and model	Reference baseline
Federated HMR	Model updates	Individual updates and aggregate model	Main benchmark setting
Secure aggregation variant	Masked model updates	Aggregated update only	Privacy-strengthened comparison

PCHMR reduces raw-data centralization. Vanilla federated learning does not rule out all update-level or released-model leakage; secure aggregation addresses individual-update visibility under the stated honest-but-curious server assumption.

Benchmark Results

Client scale, heterogeneous partitions, and qualitative reconstructions.

Effect of Client Scale

FastMETRO-S and TORE-S remain competitive when the training data is distributed across many simulated clients. The most difficult 1000-client setting introduces only a modest drop relative to the centralized baseline.

Client-scale evaluation. Results are reported in millimeters. Lower is better.

Clients (total / per round)	Model	MPJPE	PA-MPJPE	Raw data local?
Centralized	FM-S	57.98	40.62	No
10 / 10	FM-S	56.85	40.52	Yes
100 / 10	FM-S	59.31	41.48	Yes
1000 / 10	FM-S	60.82	44.53	Yes
Centralized	TORE-S	63.88	41.99	No
10 / 10	TORE-S	61.27	41.60	Yes
100 / 10	TORE-S	62.92	43.11	Yes
1000 / 10	TORE-S	66.04	44.04	Yes

FastMETRO benchmark on Human3.6M — **Human3.6M.** FastMETRO-S benchmark trends under privacy-constrained training settings.

FastMETRO benchmark on 3DPW — **3DPW.** FastMETRO-S benchmark trends, including distributed fine-tuning and natural partitions.

LDA data distribution evaluation — **Heterogeneous data.** LDA-controlled client partitions on Human3.6M. Smaller α values create more skewed client data.

Natural Partition on 3DPW

Natural scene partitions approximate real deployment, where each user's data is shaped by identity, clothing, environment, camera pose, and local data volume.

3DPW fine-tuning. The natural partition assigns each scene to a client.

Setting	Model	MPJPE	PA-MPJPE	MPVPE	Raw data local?
Centralized	FM-S	84.92	54.69	97.60	No
PCHMR	FM-S	84.66	54.78	97.42	Yes
PCHMR (Natural Part.)	FM-S	86.83	55.61	99.66	Yes
Centralized	TORE-S	87.97	55.35	101.88	No
PCHMR	TORE-S	87.55	54.08	101.86	Yes
PCHMR (Natural Part.)	TORE-S	88.00	55.07	102.50	Yes

Qualitative HMR results on Human3.6M and 3DPW — **Qualitative reconstructions.** Results on Human3.6M and 3DPW, trained with 100 clients and 10 randomly sampled clients per round using FastMETRO and TORE backbones.

Local Annotation and Personalization

Depth-aware pseudo-labels for user-side adaptation.

DePoser annotates user-side images locally with pose and depth priors. The resulting pseudo-labels enable local personalization of a collaboratively trained HMR model without centralizing private images.

Local personalization on VR data — **VR personalization.** FastMETRO-S before and after PCHMR fine-tuning on locally annotated VR data. The pseudo-ground truth is generated by DePoser.

DePoser compared with SMPLify-X — **Local annotation.** DePoser improves foot placement and hand positioning compared with SMPLify-X by using Sapiens pose and depth priors.

FastMETRO before and after privacy-constrained fine-tuning — **Privacy-constrained fine-tuning.** Local adaptation improves reconstruction quality on in-the-wild examples by aligning the model to the local pseudo-labeled distribution.

Personalization Results

Local fine-tuning accuracy. Results are measured against DePoser-generated pseudo-labels and therefore reflect consistency with the local pseudo-ground-truth distribution.

Dataset	Before / After	Model	MPVPE	MPJPE	PA-MPJPE
VR-runner	Before PCHMR	FM-S	109.50	114.51	60.07
VR-runner	After PCHMR	FM-S	93.23	58.24	43.16
VR-game-1	Before PCHMR	FM-S	176.24	172.57	78.21
VR-game-1	After PCHMR	FM-S	65.57	62.30	43.82
VR-game-2	Before PCHMR	FM-S	150.64	161.82	84.36
VR-game-2	After PCHMR	FM-S	82.53	69.49	53.57
Oculus	Before PCHMR	FM-S	103.92	102.22	61.62
Oculus	After PCHMR	FM-S	56.76	57.92	39.94

Local annotator comparison. DePoser improves reconstruction accuracy on a sub-sampled 3DPW set by adding a depth-aware local annotation objective.

Local annotator	MPVPE	MPJPE	PA-MPJPE
SMPLify-X	198.66	181.07	87.17
DePoser	162.60	151.19	71.14

Privacy scope

Citation

@article{cao2025pchmr,
  title={PCHMR: Empowering and Benchmarking Human Mesh Recovery in Privacy-Constrained Real-World Settings},
  author={Cao, Zeyu and Wu, Qingxuan and Dou, Zhiyang and Xu, Rui and Liu, Yuan and Fernandez-Marques, Javier and Lane, Nicholas D. and Komura, Taku and Wang, Wenping},
  year={2025}
}