Realtime Simultaneous Tracking and Pose Estimation
Learning Large Margin Likelihood for Realtime Head Pose Tracking
E. Ricci and J.M Odobez,
in IEEE Int. Conference on Image Processing (ICIP), Cairo, Nov. 2009.
In this work we consider the problem of head tracking and pose estimation in realtime from low to mid-resolution images.
Tracking and pose recognition are treated as two coupled problems in a probabilistic framework: a template-based algorithm with multiple pose-specific reference models is used to determine jointly the position and the scale of the target and its head pose. Target representation is based on Histograms of Oriented Gradients (HOGs): descriptors which are at the same time robust under varying illumination, fast to compute and discriminative with respect to pose.
A main novelty of this approach concerns the likelihood, i.e. the function which measures the compatibility between the current observation and the reference models of a specific pose. We define it as a function of a set of parameters and we learn them offline in a way such that the similarity between two images is imposed to be high if the poses are close and large otherwise. To this aim we introduce a new discriminative algorithm based on the large margin paradigm which improves significantly pose estimation accuracy.
Example Videos
The following videos show the robust tracking results achieved by our algorithm. The clocks on the right side of the video show the estimated pose (green) and the ground truth (blue) if available.- Idiap head pose database
- Webcam videos
- AMI corpus
In these videos, people might be writting (this may result in a high tilt situation) or discussing. The behaviour are natural, with no restriction on head motion. On several occasions, people are partially occluding their faces. The background is very cluttered, with a lot of texture and red (skin like) objects.
• M1R.avi (example with hand occlusion) • M2L.avi (person writing at the end of the sequence) • M2R.avi (person writing, then passing paper sheet to neighbour person) |
In this sample, the person is arriving, and sitting down. There are several full occlusions by a person on the other side of the table (nearer to the the camera). |