Creating databases

So far, the algorithms in this package can be ran on two sets of data: generated from AMI and VidTIMIT databases. Both databases need to be downloaded first. We used the original databases to create corresponding sets of genuine and tampered videos, as well as, evaluation protocols.

VidTIMIT database

From the images in VidTIMIT, we generate video files for genuine subset and we down-sample audio file to 16bits. We also generate our own tampered videos (so far, for each video we replace the speech from 5 other random speakers).

Assemble original and generate tampered videos

Here are the steps for assembling original data into video files and generating tampered data:

  • Provided you have VidTIMIT database downloaded to /path/to/vidtimit, generate genuine audio-video data:

$ bin/convert_audio_mono_16k.py -d /path/to/vidtimit/audio -o /output/dir/vidtimit_nontampered
$ bin/non_tampered_images2videos.py -d /path/to/vidtimit/video -o /output/dir/vidtimit_nontampered
  • Generate tampered video (produce 5 tampered audio files for each genuine file by replacing audio tracks of another speaker):

$ bin/generate_tampered.py -d /output/dir/vidtimit_nontampered -o /output/dir/vidtimit_tampered -t 5

The script, for each genuine video will take randomly audio from 5 other people and create audio file with the same name, thus creating 5 audio-video pairs where lip movements do not match the speech.

Please note: the folder name with genuine videos should contain the name of the database and the suffix _nontampered, e.g., for VidTIMIT the folder name is vidtimit_nontampered, and the folder for tampered videos should contain the name of the database and the suffix _tampered. This naming convention is important and is used throughout all of the experiments.

Create file lists for training and testing

Depending on the random generator on your system and the seed settings, you may generate tampered videos that are different from what we have used in our experiments. This will mean that the file lists, which we use in our experiments, will need to be updated, especially for tampered videos. To do that, you can run the following bash script that will update file lists according to the videos you have generated as per the steps above:

$ ./bob/paper/lipsync2019/scripts_db/create_filelists.sh /output/dir db_name

Where /output/dir is the root path to where the generated videos are stored and db_name is the name of the database. For instance, if you generated videos for VidTIMIT database, /output/dir should have two subfolders inside: vidtimit_tampered and vidtimit_nontampered and db_name should be vidtimit.

Create facial landmarks with Openpose (optional)

If you have Openpose landmark detector compiled and installed on your system, you can run it to detect landmarks. You can install it using install_openpose.sh bash script inside bob/paper/lipsync2019/job directory, which would install Openpose on Linux. Alternatively, if you don’t have Openpose annotations, the code will run MTCNN landmark detector by default during the feature extraction stage. MTCNN landmark detection is not as good as Openpose but is good enough, if you don’t want to go through difficulties of installing and running Openpose.

To run Openpose on all videos in the database follow these steps:

$ cd bob/paper/lipsync2019/job
$ bash submit_cpm_detection.sh $(find /output/dir/vidtimit_nontampered -name '*.avi')
$ bash watch_jobs.sh /output/dir/vidtimit_nontampered
$ bin/reallocate_annotation_files.py -a /output/dir/vidtimit_nontampered -o /output/dir/vidtimit_tampered

AMI database

Since AMI has a lot of different types of videos that are not very suitable for lip-sync detection, we need to extract a suitable set of videos (a single person in the video frame speaking). Using the annotation files provided in project/savi/data/ami_annotations/ folder, we cut 15-40 seconds videos from the single speaker shots and use the audio recorded with lapel mic.

To generate training and development data from AMI, follow these steps:

  • Provided you have AMI database downloaded to /path/to/ami, you can generate genuine videos by running the following script:

$ bin/python bob/paper/lipsync2019/scripts_amicorpus/generate_non-tampered.py -d /path/to/ami -a
bob/paper/lipsync2019/data/ami_annotations/p1.trn.mdtm -o /output/dir/ami_nontampered
  • Generate tampered video (5 tampered for each genuine) set by running the following:

$ bin/python bob/paper/lipsync2019/scripts_amicorpus/generate_tampered.py -d /output/dir/ami_nontampered -o
/output/dir/ami_tampered -t 5

This script, for each genuine video will take randomly audio from 5 other people and merge it with this video, thus creating 5 tampered videos where lip movements do not match the speech.

  • Split video and audio in different files (run for both genuine and tampered directories):

$ bin/python bob/paper/lipsync2019/scripts_amicorpus/bin/extract_audio_from_video.py -d /output/dir/ami_nontampered
-o /output/dir/ami_nontampered -p /output/dir/ami_nontampered
  • The rest of the processing is the same as for VidTIMIT