Multimodal Interaction and Multimedia Data Mining

The goal of the MULTI project is to carry out fundamental research in several (related) fields of multimodal interaction and multimedia data mining, covering a wide range of fundamental research activities in recognition and interpretation of spoken, written and gestural language, indexing and retrieval of complex multimedia information, advanced biometric user authentication techniques (including speaker and face verification), advanced machine learning and data fusion algorithms. By “multimedia information” we refer here to all static and dynamic images (photos, animations and video), sounds (speech recordings, music, and general audio information), moving or static text, also including the sharing of this information (through, e.g., social networks). More recently, MULTI also started focusing on the extraction, modeling and understanding of “social signals” (non verbal signals). More specifically, the research areas addressed by MULTI cover the 6 general research themes: (1) machine learning, (2) speech and audio processing, (3) computer vision, (4) information retrieval, (5) biometric authentication, and (6) multimodal interaction. Specific research activities covered by the current funding period include: Bottom-up recognition of speech sounds; Cross-modal weakly supervised learning; Social Network Analysis for multimedia indexing problems; Multi-View Face Detection; Modeling Social Media; Detection and description of unexpected words in ASR; Activity modeling from multiple sensors; Joint Bi-Modal Person Authentication; Social Signal Processing; Robust 3D Head Tracking and Head Gesture Recognition. Whenever possible and/or appropriate, all MULTI research projects work in the framework of common IDIAP tools (common task, common databases and common software), which are related to “human-to-human communication modeling and understanding”. Indeed, quite recently, and as a particularly challenging application, researchers in multimodal processing started focusing on the modeling and understanding of human-to-human communication and on creating novel technologies to augment the ways in which people communicate with each other, not only through computers, but also with computer assistance in the background. We believe that meeting processing provides us with an ideal, unified, framework to investigate most of the multimodal interaction and multimedia data mining technologies mentioned above.

Themes

Perceptive and Cognitive Systems

Leader Name

Funding

Start

Oct 01, 2008

Stop

Sep 30, 2010

Groups

Speech & Audio Processing

Contacts