Séminaires

Les séminaires sont des présentations scientifiques données par des visiteurs externes à l'Idiap.

Prochain événement

À venir.

 

Événements passés

 

Parallel Split Learning for Wireless Networks

Yue Gao, Fudan University, China

October 11, 2024

Abstract

For wireless networks, edge intelligence is hindered from revolutionising how smartphones and base stations process and analyse data by bringing AI capabilities closer to the source of data generation. Split Learning (SL) will be introduced in this talk. This new distributed deep learning paradigm enables resource-constrained devices to offload substantial training workloads to edge servers via layer-wise model partitioning. By resorting to parallel training across multiple devices, SL addresses the latency and bandwidth challenges of traditional centralised and federated learning, ensuring efficient and privacy-preserving data processing at the edge of wireless networks. I will present our recent work on Efficient Parallel Split Learning (EPSL), designed to overcome the limitations of existing parallel split learning schemes. EPSL enhances model training efficiency by parallelising client-side computations and aggregating last-layer gradients, reducing server-side training and communication overhead.

Bio

Yue Gao is a Chair Professor at the School of Computer Science and Dean of the Institute of Space Internet at Fudan University, China. He is a Fellow of the IEEE, the IET and CIC. He received his MSc and PhD from the Queen Mary University of London (QMUL) U.K. in 2003 and 2007. He has worked as a lecturer, senior lecturer, reader, and professor at QMUL and the University of Surrey. His research interests include satellite internet and AI-powered networks. He was a co-recipient of the EU Horizon Prize Award on Collaborative Spectrum Sharing in 2016 and elected an Engineering and Physical Sciences Research Council Fellow in 2017. He is a member of the Board of Governors and Distinguished Lecturer of the IEEE Vehicular Technology Society (VTS), Chair of the IEEE ComSoc Wireless Communication Technical Committee, and past Chair of the IEEE ComSoc Technical Committee on Cognitive Networks.

 

 

NLP Meets Creative Media: Research Applications, Challenges and Opportunities

Elena Epure, Deezer

October 04, 2024

Abstract

In this talk, I will explore the intersection of Natural Language Processing (NLP) and creative media, highlighting the essential role of language: from lyrics and music descriptors to dialogues in podcasts, language not only relates to content but also serves as a key interaction channel between users and digital media platforms. NLP techniques are therefore needed for a range of applications, such as describing creative media and modeling user interactions with it. Drawing on practical needs, challenges, and emerging opportunities in this field, I will outline several research-driven applications, focusing on music, podcasts, and audiobooks. In the second part of the talk, I will dive deeper into specific works, including modeling cross-cultural perceptions of music genres and handling named entities for describing music or extracting topics from podcast metadata. These examples demonstrate how NLP research can enhance both user experiences and content understanding in creative media, though many challenges remain.

Bio

Elena Epure is a Senior NLP Research Scientist at Deezer, a global streaming service for music, podcasts, and audiobooks based in Paris. She has a multidisciplinary educational background in computer science, digital humanities, and business, complemented by diverse professional experience in research, software engineering, consultancy, and teaching. Elena’s current research focuses on creative media, building on her previous work with social media and news content, She explores topics related to behavior modeling from text, personalization and contextualization, and computational humanities.

 

 

Could deep learning video generation models understand the physical world? A philosophical perspective

Pierre Beckmann, University of Bern

October 02, 2024

Abstract

SORA, a deep learning video generation model for generating ultra-realistic videos is said to “understand the physical world in motion”. In this presentation, I want to subject this intuition to philosophical scrutiny. I will proceed by building a set of conditions for understanding that I will apply to SORA. This will allow me to both reveal the sense in which deep learning video generation models might be said to “understand” and to lay bare the primary axes for evaluating the degree of such an understanding. At the end of the talk, I will discuss the potential contributions of such a philosophical investigation to deep learning research.

Bio

Pierre Beckmann is a PhD student with a dual education in AI and Philosophy. He has held research positions at Disney Research, Logitech, EPFL as well as the University of Bern. He has published papers in both Deep Learning and Philosophy. His research lies at the intersection of AI and Philosophy, focusing on deep learning and understanding. He explores questions such as how to better understand deep learning models or their applications in acquiring new scientific understanding. Recently, he has also investigated whether these models can be said to possess understanding themselves.

 

 

Speaker recognition in forensics and homeland security

Prof. Itshak Lapidot, Afeka Tel-Aviv Academic College of Engineering

September 27, 2024

Abstract

Speaker recognition technologies evolve much in the resent years. As such it is not enough only to verify whether to speech segments belong to the same target speaker. In this work I will present three other branches of speaker recognition technology that might be very important in forensics and homeland security, but not only there:

Time-domain based anti-spoofing: most anti-spoofing systems based on some transformation of the frequency domain, such as spectrograms, cepstrograns, etc. It will be shown that time domain embeddings carry an important information. This information can be complimentary to the frequency-domain based countermeasure systems. Another important property is their explainability, which is very important in many domains, including forensics. Homogeneity measure for speaker verification: usually 2 segments are compared and the ASV system provide a score. There is no information whether the comparison is meaningful or not. If there is no common information in the segments, then the results are not reliable even if the decision is correct. It will be shown that ASV system performance are increase (in terms of Cllr) as the homogeneity measure increases. A knowledge about common information may be very important in forensic application, but also gain in other fields.
Short segments clustering: short segments clustering is important in speaker diarization task, but not limited. In many scenarios like air-traffic-control, natural disaster or attack, tens of thousands short speech segments are recorded from tens of speakers. To analyze all the data, first a segmentation on the speaker level is very important. A stochastic mean-shift algorithm will be presented in order to deal with this task.

Bio

Research interest mainly at Speaker Diarization, Speaker Clustering, and other related topics Image citations.jpeg

 

 

The Mirror Transform (https://ieeexplore.ieee.org/document/9779467)

Prof. R. Leonardi, IEEE Fellow

August 23, 2024

Abstract

Symmetries play an essential role for the understanding and modelling of the world. Natural objects, their dynamics, or more generally speaking natural waveforms often exhibit local "partial" symmetries which are key to the understanding/modelling of laws of physics or to the description/recognition of real objects. In the framework of this talk we propose a simple signal processing operation which opens new pathways to alternative representations of information, with possible use in classification, information modelling, representation, or compression. Looking for symmetries involves non-linear processing. The use of such non-linearity turns out quite effective as we will show for information representation and modelling.

Bio

Riccardo Leonardi obtained his Diploma (1984) and Ph.D. (1987) degrees in Electrical Engineering from the Swiss Federal Institute of Technology in Lausanne. After conducting research on visual communications at UCSB and Bell Laboratories for ~5 years, he was appointed in 1992 at the University of Brescia, Italy to establish activities in Signal Processing and Communications. He holds there the Signal Processing Chair. His main research interests cover the field of Multimedia Signal Processing applications and Visual Communication (mainly Image/Video Compression & Content-Based Media Analysis). He has actively participated to ISO/MPEG standardisation activities. Ricardo holds more than 300 papers and patents in the field. He is a fellow of the IEEE. He is currently acting as GTTI Chairman (elected position by all Italian Faculty members) for Italian Academic Coordination in the fields of Signal Processing, Communication, Networking, and Remote Sensing.

 

 

Diffusion Morphs (DiM): The power of iterative generative models for attacking FR systems

Zander Blasingame, Clarkson University in Potsdam, NY, USA

July 23, 2024

Abstract

Morphing attacks are an emerging threat to state-of-the-art Face Recognition (FR) systems, which aim to create a single image that contains the biometric information of multiple identities. Diffusion Morphs (DiM) are a recently proposed morphing attack that has achieved state-of-the-art performance for representation-based morphing attacks. However, DiMs suffer from slow inference speed, requiring a high number of Network Function Evaluations (NFE) and are still outperformed by landmark-based morphing attacks. In this talk I cover recent advancements in DiMs which address these issues. The talk will cover three recent advancements which are enumerated below:
1. Fast-DiM: The inference speed of DiMs are improved by employing higher-order numerical ODE solvers to reduce the number of NFE.
2. Greedy-DiM: The vulnerability of FR systems is dramatically increased by employing a greedy optimization strategy during each step of the generative process. Greedy-DiM beats landmark-based morphs on the studied dataset.
3. AdjointDEIS: A novel strategy for backprograting the gradients of diffusion models w.r.t. the initial noise, conditional information, and model parameters are presented for both probability flow ODE and diffusion SDE formulations of diffusion models using the method of adjoint sensitivity.

Bio

Zander W. Blasingame is working on his Ph.D. at Clarkson University in Potsdam, NY, USA. He previously received his B.Sc. from Clarkson University in 2018. His research interests are in generative models with a focus on diffusion models, latent representations, generative flow models, stochastic processes. Within biometrics his research has focused on the development of face morphing attacks using generative models. His research is supported by the NSF and DHS.

 

 

Opportunities for Artificial Intelligence in Blood Trace Forensics

Dr. Daniel Attinger, Attinger-consulting.com

June 03, 2024

Abstract

This talk describes general purposes and methods used for interpreting blood traces on crime scenes, and opportunities for contributions of artificial intelligence. On the 3D geometries of crime scenes, blood traces are typically documented with photography, and their features described in a qualitative manner. Then, binary classification trees are used to evaluate the mechanisms generating the blood traces. For example, the spatial distribution and size of a pattern of elliptical stains may be associated with a fast impact on a bloody surface, as in a violent beating. Forensic practitioners learn classification techniques while generating blood traces with known mechanisms as part of their training, and error rates have recently been estimated as high as 10%. An open question is if artificial intelligence can improve the evaluation blood traces. Example of datasets and attempts to use machine learning and deep learning are presented, to open discussions on current limitations and future opportunities.

Bio

Dr. Daniel Attinger is an engineer and scientist with expertise in a fundamental science called fluid dynamics. After a Ph.D. on the impact of drops, he acquired basic and advanced training in the forensic discipline called bloodstain pattern analysis (BPA). Between 2009 and 2020, he served as lead scientific investigator for three research projects on evaluating and advancing BPA, managing more than USD 2 million awarded by the US National Institute of Justice and Army Research Office. Attinger has performed BPA in actual crime scenes or on the basis of investigative photographs. He has been qualified as an expert witness in US State criminal courts. He has published widely cited scholarly work in leading peer-reviewed forensic and fluid dynamic journals. Dr. Attinger has served as a reviewer for Federal Agencies on research funding and forensic standards. He has developed academic and professional curricula. Other domains of his expertise are the physics of boiling and the related heat and mass transport. He is a fellow of the American Association of Mechanical Engineers and a member of the International Association of Bloodstain Pattern Analysts. Dr. Attinger has taught in French, English, German, and Spanish, and his work has been featured in US and international media

 

 

Fast multiphoton imaging of embryonic development

Willy Supatto, Laboratory for Optics and Biosciences, Ecole polytechnique, CNRS, INSERM (Palaiseau, France)

May 22, 2024

Abstract

Multiphoton microscopy has demonstrated unique advantages for imaging embryonic development in 3D and in vivo, including a large imaging depth or the ability to combine nonlinear fluorescence excitation with other contrast mechanisms, such as second or third harmonic generation. However, the acquisition speed is often a critical limitation for multiscale imaging or for investigating fast biological phenomena. Over the past decade, we have developed several strategies to circumvent or overcome this limitation. Indeed, through optical or image processing approaches, by using light-sheet illumination or exploiting biological periodicities and imaging artifacts, we demonstrated how to capture and investigate processes of extreme dynamics deep inside a live embryo. For example, I will show how we have been able to capture multimodal and multicolor multiphoton signals in a beating embryonic heart, resolve blood flow at the micrometer scale, record neuronal activity in an entire developing brain, study beating cilia or quantify the microscopic flows they generate deep inside the embryo. Capturing and quantifying such dynamic processes by using advanced optical and image processing tools provide new insights into embryonic development.

Bio

Willy Supatto is a Directeur de Recherche CNRS based at the Laboratory for Optics and Biosciences, Ecole polytechnique, CNRS, INSERM in Palaiseau, France. He is currently on a two-month academic visit at Idiap, supported by a CNRS International Emerging Actions fellowship. https://portail.polytechnique.edu/lob/en/willy-supatto

 

 

Solving hard problems in robotics – with a little help from semidefinite relaxations, nullspaces, and sparsity

Dr Frederike Dümbgen, Robotics Institute of University of Toronto

March 20, 2024

Abstract

Many state estimation and planning tasks in robotics are formulated as non-convex optimization problems, and commonly deployed efficient solvers may converge to poor local minima. Recent years have seen promising developments in so-called certifiably optimal estimation, showing that many problems can in fact be solved to global optimality or certified through the use of tight semidefinite relaxations.

In this talk, I present our efforts to make such methods – for the field of state estimation in particular – more practical for roboticists. Among those efforts, I will present novel efficient optimality certificates as a low-cost add-on to off-the-shelf local solvers, which apply to a variety of problems including range-only, stereo-camera and, more generally, matrix-weighted localization. Then, I present our approach to automatically certify almost any state estimation problem, using a sampling-based method to automatically find tight relaxations through nullspace characterizations. I end with an overview of our most recent work, which allows to create both fast and certifiably optimal solvers by exploiting the sparse problem structure.

Bio

Frederike Dümbgen is currently a postdoctoral researcher at the Robotics Institute of University of Toronto, working with Prof. Tim Barfoot. She received her Ph.D. in 2021 from the Laboratory of AudioVisual Communications (LCAV) with Prof. Martin Vetterli and Dr. Adam Scholefield in Computer and Communication Sciences at École Polytechnique Fédérale de Lausanne (EPFL), Switzerland. Before that, she obtained her B.Sc. and M.Sc. in Mechanical Engineering from EPFL in 2013 and 2016, respectively, with a minor in Computational Science and Engineering, and Master's thesis at the Autonomous Systems Lab of ETH Zürich. Her research has ranged from novel localization methods, using in particular acoustic, radio-frequency and ultra-wideband signals, to, most recently, global optimization for robotics.

 

 

Isaac Asimov, robots and planet Earth

Pierre-Brice Wieber

February 1, 2024

Abstract

Isaac Asimov was a scientist and a science-fiction writer who invented the words « robotics » and « roboticist ». Doing so, he proposed Three famous Laws of Robotics: 1) A robot may not injure a human being or, through inaction, allow a human being to come to harm. 2) A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. 3) A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. He later added a Zeroth Law, superseding the others: A robot may not injure humanity or, through inaction, allow humanity to come to harm. I propose to discuss how these different statements can be put to use when doing robotics today.

Bio

Pierre-Brice Wieber is a full-time researcher at INRIA Grenoble. He graduated from Ecole Polytechnique in 1996 and received his PhD degree in Robotics from Ecole des Mines de Paris in 2000. He was a visiting researcher at AIST/CNRS Joint Research Lab in Tsukuba in 2008–2010. Pierre-Brice has been serving as Associate Editor for IEEE Transactions on Robotics, Robotics and Automation Letters and conferences such as ICRA and Humanoids. His research interests include the modeling and control of humanoid and manipulator robots.

 

Deep Surface Meshes

Prof. Pascal Fua, EPFL

December 12, 2023

Abstract

Geometric Deep Learning has made striking progress with the advent of Deep Implicit Fields. They allow for detailed modeling of surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable 3D surface parameterization that is not limited in resolution. Unfortunately, they have not yet reached their full potential for applications that require an explicit surface representation in terms of vertices and facets because converting the implicit representation to such an explicit representation requires a marching-cube algorithm, whose output cannot be easily differentiated with respect to the implicit surface parameters. In this talk, I will present our approach to overcoming this limitation and implementing convolutional neural nets that output complex 3D surface meshes while remaining fully-differentiable and end-to-end trainable. I will also present applications to single view reconstruction, physically-driven Shape optimization, and bio-medical image segmentation.

Bio

Pascal Fua received an engineering degree from Ecole Polytechnique, Paris, in 1984 and a Ph.D. in Computer Science from the University of Orsay in 1989. He joined EPFL (Swiss Federal Institute of Technology) in 1996 where he is a Professor in the School of Computer and Communication Science and head of the Computer Vision Lab. Before that, he worked at SRI International and at INRIA Sophia-Antipolis as a Computer Scientist. His research interests include shape modeling and motion recovery from images, analysis of microscopy images, and machine learning. He has (co)authored over 300 publications in refereed journals and conferences. He has received several ERC grants. He is an IEEE Fellow and has been an Associate Editor of IEEE journal Transactions for Pattern Analysis and Machine Intelligence. He often serves as program committee member, area chair, and program chair of major vision conferences and has cofounded three spinoff companies.

 

Disentangling Linguistic intelligence: automatic generalisation of structure and meaning across languages

Prof. Paola Merlo, UNIGE

October 20, 2023

Abstract

The current reported success of large language models is based on computationally (and environmentally) expensive algorithms and prohibitively large amounts of data that are available for only a few, non-representative languages. This limitation reduces the access to natural language processing technology to a few dominant languages and modalities and leads to the development of systems that are not human-like, with great potential for unfairness and bias. To reach better, possibly human-like, abilities in neural networks' abstraction and generalisation, we need to develop tasks and data that train the networks to more complex and compositional linguistic abilities. We identify these abilities as the intelligent ability to infer patterns of regularities in unstructured data, generalise from few examples, using abstractions that are valid across possibly very different languages. We have developed a new task and a set of problems inspired by IQ intelligence tests. These problems are developed specifically for language and aim to learn disentangled linguistic representations of underlying linguistic rules of grammar. These investigations can lead to three beneficial improvements of methods and practices: (i) deep, compositional representations would be learnt, thus reducing needs in data size; (ii) current machine learning methods would be extended to low-resources languages or low-resource modalities and scenarios; (iii) higher-level abstractions would be learnt, avoiding the use of superficial, associative cues (possibly reducing bias and potential harm in the representations learned by current artificial linguistic systems).

 

Early-exits models for automatic speech recognition on resource-constrained devices

Alessio Brutti, Fondazione Bruno Kessler, Trento, Italy

October 20, 2023

Abstract

The possibility of dynamically modifying the computational load of neural models at inference time is crucial for on-device processing, where computational power is limited and time-varying. Established approaches for neural model compression exist, but they provide static models. Relying on intermediate exit branches, early-exit architectures allow for the development of dynamic models that adjust their computational cost to resources and performance. This talk will present an experimental analysis on the use of early exit architectures in large vocabulary speech recognition scenarios, showing that properly training the models not only preserves performance levels when using fewer layers, but also improves the accuracy as compared to using single-exit models or using pre-trained models. In addition, the talk will discuss the application of early-exits architectures in federating learning frameworks with heterogeneous devices.

 

Investigating the overheating risk in a free-running building in Thailand using CIBSE TM52 and Annual Sun Exposure (ASE)

Apiparn Borisuit, EPFL

September 18, 2023

Abstract

Annual Sunlight Exposure (ASE) is widely used to assess direct sunlight exposure in the building as a proxy to detect potential visual discomfort. Even though ASE was not targeted at thermal comfort, the relationship between direct sunlight and thermal sensation has been known. The study aims to explore the associations of ASE and thermal comfort criteria through an improvement of thermal comfort in a Child Development Centre (CDC) in Thailand. An existing condition of a CDC building and a simplified version were simulated using the IESVE simulation tool. Overhangs, external shutters, and double glazing were integrated into the computer models to improve thermal comfort. CIBSE TM52 overheating criteria are used to indicate thermal comfort. We found significant correlations between ASE and the criteria of CIBSE TM52 (r=0.28 -0.56; p.

 

The Regularization of the Presentation Attack Detection (PAD) Systems By Explainability

Gökhan Özbulak, Dokuz Eylül University

June 5, 2023

Abstract

A Presentation Attack Detection (PAD) system is the crucial sub-component of the biometric systems when it comes to recognize or verify someone for further processing. In case of the lack of such PAD systems, one can penetrate into the protected areas in unauthorized way and causes the biometric validation to be broken. Therefore, a PAD system must be exist and robust against all kind of attacks including any kind of the paper, tablet screen, 2D or 3D mask etc. In this talk, I will present my past study about the generalization of PAD systems. I will propose an explainability based regularization method for the PAD systems and share the generalizability performance of the proposed method in public and cross-dataset experiments. I will also share a brief introduction about my other studies regarding of hard and soft biometrics.

 

Automatic analysis of Parkinson's disease: unimodal and multimodal perspectives

Prof. Juan Rafael Orozco-Arroyave

March 23, 2023

Abstract

Parkinson's disease (PD) is a (mainly) movement disorder and appears due to the progressive death of dopaminergic neurons in the substantia nigra of the midbrain (part of the basal ganglia). Diagnosis and monitoring of PD patients are still highly subjective, time-consuming, and expensive. Existing medical scales used to evaluate the neurological state of PD patients cover many different aspects, including activities of daily living, motor skills, speech, and depression. This makes the task of automatically reproducing experts' evaluations very difficult because several bio-signals and methods are required to produce clinically acceptable/practical results.

This talk tries to show how different bio-signals (e.g., speech, gait, handwriting, and facial expressions) can be used on the way to find suitable models for PD diagnosis and monitoring. Results with classical feature extraction and classification methods will be presented along with CNN and GRU -based architectures.

 

Understanding Neural Speech Embeddings for Speech Assessment

Prof. Elmar Nöth

January 20, 2023

Abstract

In this talk, we present preliminary results on experiments which were performed in order to understand, what information is represented in which layer of deep neural networks. We will motivate our experiments with an image processing problem (identification of orca individuals based on the dorsal fin), where we show that the result of unsupervised clustering of previously unseen individuals strongly depends on the underlying embedding and for what that embedding was trained in a supervised manner. We then present preliminary results on t-SNE projections of different pathologic an control corpora based on the different layers of a pre-trained wav2vec2 module and end with an outlook to current and future research.

 

The e-David project: Painting strategies and their influence on robotic painting

Prof. Dr. Oliver Deussen, University of Konstanz

August 2, 2022

Abstract

Our drawing robot e-David is able to create paintings using visual feedback. So far, our paintings have been created using a stroke-based metaphor. In my talk I will speak about the development of a n umber of stroke-based styles. However, being in close contact with artists we realized at some point that painting can much better be modeled by interacting and contrasting areas instead of strokes - which are more the basis of drawings. This paradigm shift allows us to construct paintings from a different perspective; the interaction between areas enables us to model different forms of abstraction and reshape areas according to style settings. We will also be able to integrate machine-learning based tools for analyzing and deconstructing input images. This enhances our creative space and will allow us to find our own forms of machine abstraction and representation.

 

Artificial Intelligence meets Digital Forensics: a panorama

Prof. Anderson Rocha  

July 14, 2022

Abstract

In this talk, we will discuss a panoramic view of digital forensics in the last 10 years and how it needed to evolve from basic computer vision and simple natural language processing techniques to powerful AI-driven methods to deal with the signs of the new age. We will discuss tampering detection, fact-checking, deepfakes, and authorship analysis as well as recent advances in self-supervised learning to deal with large-scale search in some forensics problems.