We propose a method that makes standard turntable-based vision acquisition a practical method for recovering models of human geometry. A human subject typ ically exhibits some unintended joint motion while rotating on a turntable. Ignoring such motion causes shape-from-silhouette to excessively carve the mod el, resulting in loss of geometry (especially on limbs). We utilize silhouette cues with an initial automatically recovered skinned-model to recover this joint motion, or wobbling. The recovered joint motion gives the calibration of each rigid body of the subject, allowing for temporal fusion of image cues (e.g., silhouettes and texture) used to refine the geometry. Our method gives improved results on real data sets when considering silhouette overlap in novel views. The recovered geometry is useful in vision tasks such as multi-view image-based tracking of humans, where the recent trend of using a priori laser-scanned geometry could be replaced with a more cost effective vision-based geometry.

The objective of this project was to acquire static human models in a convenient setup (e.g., quick and minimum hardware). We accomplish this by extending simple turntable-based capture to human subjects.

The main problem in using turntables for human shape acquisition is that the human subject often wobbles when rotating on a turntable. In the past we have attempted to overcome this by adding a stabilizing rod to support the human; but we found that the stabilizer occluded texture samples and didn't remove all wobbles. Instead, we take a vision-based approach.

Silhouette-based human tracking has demonstrated success in multi-camera setups with approximate models. We adapt these methods to the case of a rotating subject with fewer cameras (as few as two), ensuring that geometric constraints (such as feet placement) are obeyed. The tracked joint coordinates are used to reconstruct the geometry of each part; these parts are combined to obtain a new geometry. The tracking and refinement are then iterated several times. See the movies below or the paper for more details.

tracking-turntable.mp4

neil2_jump.mp4

Resulting models rendered using a single texture map computed with multi-band blending:

Several texture methods were evaluated (a basic weighted blend, a super-resolution approach, and a multi-band blend). The multi-band blending gives the best visual results.