1University of North Carolina at Chapel Hill 2Korea Advanced Institute of Science and Technology 3Adobe Research
we propose Neural Human Performer, a novel approach that learns generalizable radiance fields based on a parametric body model for robust performance capture. In addition to exploiting a parametric body model as a geometric prior, the core of our method is a combination of temporal and multi-view transformers which help to effectively aggregate spatio-temporal observations to robustly compute the density and color of a query point. First, the temporal transformer aggregates trackable visual features based on the input skeletal body motion over the video frames. The following multi-view transformer performs cross-attention between the temporally-augmented skeletal features and the pixel-aligned features from each time step. The proposed modules collectively contribute to the adaptive aggregation of multi-time and multi-view information, resulting in significant improvements in synthesis results in different generalization settings.
We compare our method with Neural Body [1] on seen model's unseen poses. Note that Neural body was per-subject optimized (one network for one person) while our model was optimized on all training subjects (one network for multiple people).
We compare our method with pixelNeRF [2] on unseen model's unseen poses.
We compare our method with pixelNeRF [2] on unseen model's unseen poses.
We thank Sida Peng of Zhejiang University, Hangzhou, China, for very many helpful discussions on a variety of implementation details of Neural Body. We thank Ruilong li and Alex Yu of UC Berkeley for many discussions on the AIST++ dataset and pixelNeRF details. We also thank Alex Yu for the template of this website.
Please send any questions or comments to YoungJoong Kwon.