Fake videos detected thanks to audio-visual inconsistencies
Have you seen Mark Zuckergberg explaining with confidence how Facebook manipulates its customers? This clip went viral and is the perfect example of how fake videos are getting more convincing. Such content is more and more indistinguishable from authentic content for the human eyes and ears. According to the AI Foundation, “70% or fewer untrained human subjects were able to tell fake and real videos apart. […] Meanwhile, as we demonstrate in our FaceForensics work (and observed by other researchers in the field), machine learning models far surpass untrained humans in telling real and synthetic/forged content apart.” In this context, Idiap researchers work is crucial, as its focuses on detecting tampering in a video with a person speaking to a camera. Member of the group Biometrics Security & Privacy led by Sébastien, Marcel Pavel Korshunov paper was presented during the world renowned International Conference on Machine Learning (ICML) in California.
Using so called machine learning models, researchers develop computing abilities to detect inconsistencies between the speech and the video. Using a lot of examples, they teach to the computer how to identify when there is a discrepancy between the speech related sound of a track and the lips movements of the apparent speaker. As underlined by the AI Foundation, “Besides being a well-experimented result, the importance of this line of work cannot be stressed in today’s information climate, and also because this form of manipulation is easy to perform since one can replace a part of the audio, dramatically changing the meaning of the video.”
Title of the paper: “Tampered Speaker Inconsistency Detection with Phonetically Aware Audio-visual Features.”