1Beihang University 2The University of Hong Kong 3SenseTime Research and Tetras.AI 4Shanghai AI Lab 5The University of Sydney
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions. Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately. In this way, it is straightforward to take advantage of the latest research progress on the single-hand pose estimation system. However, hand pose estimation in interacting scenarios is very challenging, due to (1) severe hand-hand occlusion and (2) ambiguity caused by the homogeneous appearance of hands. To tackle these two challenges, we propose a novel Hand De-occlusion and Removal (HDR) framework to perform hand de-occlusion and distractor removal. We also propose the first large-scale synthetic amodal hand dataset, termed Amodal InterHand Dataset (AIH), to facilitate model training and promote the development of the related research. Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
Figure 2.Illustration of our Hand De-occlusion and Removal (HDR) framework for the task of 3D interacting hand pose estimation. We first employ HASM (Hand Amodal Segmentation Module) to segment the amodal and modal masks of the left and the right hand in the image. Given the predicted masks, we locate and crop the image patch centered at each hand. Then, for every cropped image, the HDRM (Hand De- occlusion and Removal Module) recovers the appearance content of the occluded part of one hand and removes the other distracting hand simultaneously. In this way, the interacting two-hand image is transformed into a single-hand image, and can be easily handled by SHPE (Single Hand Pose Estimation) to get the final 3D hand poses.
@article{meng2022hdr,
title={3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal},
author={Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, and Ping Luo},
booktitle={European Conference on Computer Vision (ECCV)},
year={2022}
month={October},
}
We would like to thank Wentao Jiang, Wang Zeng, Neng Qian, Yumeng Hu, Lixin Yang, Yu Rong, Qiang Zhou and Jiayi Wang for their helpful discussions and feedback. Mengxiang Lin is supported by State Key Laboratory of Software Development Environment under Grant No SKLSDE 2022ZX-06. Ping Luo is supported by the General Research Fund of HK No.27208720, No.17212120, and No.17200622. Wanli Ouyang is supported by the Australian Research Council Grant DP200103223, Australian Medical Research Future Fund MRFAI000085, CRC-P Smart Material Recovery Facility (SMRF) – Curby Soft Plastics, and CRC-P ARIA - Bionic Visual-Spatial Prosthesis for the Blind.