IM2 is concerned with the development of natural multimodal interfaces for human-computer interaction. By “multimodal” we mean the different technologies that coordinate natural input modes (such as speech, pen, touch, hand gestures, head and body movements, and eventually physiological sensors) with multimedia system output (such as speech, sounds, and images). Ultimately, these multimodal interfaces should flexibly accommodate a wide range of users, tasks, and environments for which any single mode may not suffice. The ideal interface should primarily be able to deal with more comprehensive and realistic forms of data, including mixed data types (i.e., data from different input modalities such as image and audio).
As part of IM2, we are also focusing on computer-enhanced human-to-human interaction. Indeed, understanding human-human interaction is fundamental to the long-term pursuit of powerful and natural multimodal interfaces for human-computer interaction. In addition to making rich, socially-enhanced analyses of group process ripe for exploitation, our advances in speech, video, and language processing, as well as the tools for working with multimodal data, will improve research and development in many related areas.
The field of multimodal interaction covers a wide range of critical activities and applications, including recognition and interpretation of spoken, written and gestural language, particularly when used to interface with multimedia information systems, and biometric user authentication (protecting information access). As addressed by IM2, management of multimedia information systems is a wide-ranging and important research area that includes not only the multimodal interaction described above, but also multimedia document analysis, indexing, and information retrieval. The development of this technology is necessarily multi-disciplinary, requiring the collaborative contributions of experts in engineering, computer science, and linguistics.
To foster collaboration, and as a particularly interesting application, IM2 is mainly focusing on new multimodal technologies to support human interaction, in the context of smart meeting rooms and remote meeting assistants. In this context, IM2 thus aims to enhance the value of multimodal meeting recordings and to make human interaction more effective in real time. These goals will be achieved by developing new tools for computer supported cooperative work and by designing new ways to search and browse meetings as part of an integrated multimodal group communication, captured from a wide range of devices. Several technology prototypes, able to record meetings and to automatically generate searchable multimedia meeting archives are now available and some of the resulting technologies are being exploited by IM2 spin-offs or have been adopted by companies working in the multiple fields of Information and Communication Technology (ICT), including e.g. video-conferencing and meeting facilitation.