Idiap develops a real-time perception system for Human-Robot-Interaction
The system detects, tracks, and re-identifies people, detects whether they are speaking, and extracts relevant information including non-verbal cues such as their head gestures (nodding) or their attention towards the robot or people in the scene. While in the project it is applied on the Pepper robot, it can be applied to other video sensors (and audio sensors, provided some training data with weak supervision is provided).
Youtube link: Visual, audio, and non-verbal perception
Reference: EU Project MuMMER