MSc thesis project proposal
Deep Learning based Sound Identification
Project outside the university
Munisense (Leiden)In this project we consider the application of machine learning to the monitoring of sound in a municipality. Microphones are placed in various locations in a municipality and are used to monitor events such as passing cars, trains and planes, bird life, calamitous weather, disturbances of the peace such as reworks, or breaking glass associated with burglaries. The goal of the analysis is that the data can be used to set policies, such as speed limits, road conguration, the location of fences and sound barriers, and ight patterns, or implement actions such as the call-out of emergency services or law enforcement. At least three aspects make the analysis of these microphone data challenging. Firstly, the measurements generally are noisy. Secondly, the data available for training often have an unknown dierence from the data that are analyzed because of local and temporal variations (microphone type and condition, local acoustics, snow cover, rain). Third, it may be of interest to nd events that can only be identied with multiple microphones, such as anomalies in trac ow, or relative locations of the microphones.
Assignment
The specic aim of the project is to develop one or more machine-learning algorithms to identify events of interest to municipalities from microphone signals. The work will be based on real-world data made available by the company Munisense. A rst step will be to select a suitable architecture for the extraction of relevant infor- mation from the temporal signals. Both discriminative and generative approaches will be considered. Generative approaches will be favored as their ability to generate sounds aids performance analysis and is attractive for demonstrations to interested parties. Once an architecure has been selected, it is natural to start with simple tasks (e.g., the number of passing cars for a given environment) and progress towards more dicult and realistic tasks (e.g., number of vehicles and their type under varying weather conditions), culminating in an algorithm that is useful in a practical setting.An example architecture that could be used is that of a multi-task autoencoder, com- bined with a classication deep network. Operating on blocks of single or multi-dimensional sound data, the multi-task autoencoder would be trained to extract a meaningful domain- independent latent representation with a prescribed distribution. Sampling from this latent distribution would allow the reconstruction of signals in various domains (with, for example, varying building density and weather conditions) by selecting a suitable decoder. A deep en- coder network can be used to perform classication on the latent representation, exploiting the fact that this representation is domain-independent.
The work will be done in collaboration with Munisense, a Dutch company, situated in Leiden. It is specialised in sensor networks, especially for environmental sound measure- ments. Data is collected with intelligent microphone systems that perform preprocessing and send the result to a central storage. Preprocessed as well as raw audio-data will be made available for the purpose of the project. Labeled data may also be available.
Requirements
Signal processing background (acoustic signals); basics of machine learningContact
prof.dr. Bastiaan Kleijn
Signal Processing Systems Group
Department of Microelectronics
Last modified: 2021-03-28