MSc thesis project proposal

Deep Learning based Sound Identification

Project outside the university

Munisense (Leiden)

Systems are ubiquitous in society. Theys can be physical structures or machines, or processes created by organizations. To design an eective system or to rene an existing system, information about their usage and context is needed. This naturally leads to the monitoring of usage and context, which in turn requires the processing and interpretation of the resulting observations. With the rise of its eectiveness, machine learning is now the natural approach to performing such processing and interpretation.

In this project we consider the application of machine learning to the monitoring of sound in a municipality. Microphones are placed in various locations in a municipality and are used to monitor events such as passing cars, trains and planes, bird life, calamitous weather, disturbances of the peace such as reworks, or breaking glass associated with burglaries. The goal of the analysis is that the data can be used to set policies, such as speed limits, road conguration, the location of fences and sound barriers, and ight patterns, or implement actions such as the call-out of emergency services or law enforcement. At least three aspects make the analysis of these microphone data challenging. Firstly, the measurements generally are noisy. Secondly, the data available for training often have an unknown dierence from the data that are analyzed because of local and temporal variations (microphone type and condition, local acoustics, snow cover, rain). Third, it may be of interest to nd events that can only be identied with multiple microphones, such as anomalies in trac ow, or relative locations of the microphones.

Assignment

The specic aim of the project is to develop one or more machine-learning algorithms to identify events of interest to municipalities from microphone signals. The work will be based on real-world data made available by the company Munisense. A rst step will be to select a suitable architecture for the extraction of relevant infor- mation from the temporal signals. Both discriminative and generative approaches will be considered. Generative approaches will be favored as their ability to generate sounds aids performance analysis and is attractive for demonstrations to interested parties. Once an architecure has been selected, it is natural to start with simple tasks (e.g., the number of passing cars for a given environment) and progress towards more dicult and realistic tasks (e.g., number of vehicles and their type under varying weather conditions), culminating in an algorithm that is useful in a practical setting.

An example architecture that could be used is that of a multi-task autoencoder, com- bined with a classication deep network. Operating on blocks of single or multi-dimensional sound data, the multi-task autoencoder would be trained to extract a meaningful domain- independent latent representation with a prescribed distribution. Sampling from this latent distribution would allow the reconstruction of signals in various domains (with, for example, varying building density and weather conditions) by selecting a suitable decoder. A deep en- coder network can be used to perform classication on the latent representation, exploiting the fact that this representation is domain-independent.

The work will be done in collaboration with Munisense, a Dutch company, situated in Leiden. It is specialised in sensor networks, especially for environmental sound measure- ments. Data is collected with intelligent microphone systems that perform preprocessing and send the result to a central storage. Preprocessed as well as raw audio-data will be made available for the purpose of the project. Labeled data may also be available.

Requirements

Signal processing background (acoustic signals); basics of machine learning

Contact

prof.dr. Bastiaan Kleijn

Signal Processing Systems Group

Department of Microelectronics

Last modified: 2021-03-28