CSI TV series could help further steps towards AI based investigations
Watching Crime Scene Investigation (CSI) series is usually thrilling for the viewers, as they look for various clues to solve the case of a given episode. Phone calls, e-mails, interviews with people are among the different elements which can provide the key to solve a case. As humans, we are used to combine these different sources to create links and extract a meaningful information. But what about computers? Using AI techniques such as network analysis, machine or deep learning, researchers can ‘feed’ a computer program with these same elements. The goal is to teach the program to create similar connections. Funded by the European union and coordinated by Idiap, the Roxanne project is gathering researchers, industries and various police services from 16 countries to develop a realistic and useful tool based on AI to boost investigations.
In real life cases, the main obstacles come from the variety of the sources – voice recordings, texts messages, pictures, videos, fingerprints, etc. – and the variety of formats used by the different law enforcement agencies. In order to design and develop a flexible solution, the partners of the project want to have a service as autonomous as possible and avoiding dependencies between components. Standardized interfaces are another crucial element to have a working service.
The architecture of the integrated technology is based on set of tools developed by a group of partners. They designed its multi-source and multi-media analysis solution through the concept of generic architecture which enables the regrouping of multimedia processing software products and presenting them in different possible configurations, based on one common and generic architecture. This approach offers a lot of flexibility by allowing the testing and validation of all requested configurations on the same platform during the project.
Training with CSI series and old cases prior a first release
There are legal and ethical issues associated with acquiring real investigation data for developing and testing speech, text, video, and network analysis technologies. Nevertheless, several datasets are already available and can be exploited based on their nature and partial suitability for the project. Among these datasets are numerous phone calls involving a lot of speakers, over 500,000 anonymized e-mails and phone transcripts and episodes from CSI, the popular American criminal investigation television series. Each episode consists in a video of about 40 minutes, an audio file, and a transcript. The audio and video are extracted from the DVD of the show. The transcripts were published by the University of Edinburgh. The transcripts also contain the role of each speaker (suspect, killer, or other).
A few days ago, gathering virtually about 80 participants, the first field-test successfully ended. The test involved numerous technologies, such as Automatic Speech Recognition, Speaker Identification, Gender Identification, Keyword and Topic Detection, Named Entity Recognition or Network Analysis. These technologies are essential to extract meaningful information from the data. For example, gender detection can narrow down the search of a suspect. Early results provided by partners were incorporated in an interactive network analysis tool, which displays for each node in the network, the identity predicted by the speaker identification system, and the gender predicted. Such tool can support police services in the identification of speakers involved in criminal investigations.
An important work was also required by the European commission to comply with ethical, social and privacy criteria. The project is also reviewed on controlled on a regular basis. Following these reviews, partners of the project will receive two other releases, one in 2021 and another in 2022.
More information