Agenda

MSc SS Thesis Presentation

Room geometry estimation from stereo recordings using neural networks

Giovanni Bologni

Acoustic room geometry estimation is often performed in ad hoc settings, i.e. using multiple microphones and sources distributed around the room, or assuming control over the excitation signals. To facilitate practical applications, we propose a fully convolutional network (FCN) that localizes reflective surfaces under milder assumptions, such as 1. a compact array of only two microphones is available, 2. emitter and receivers are not synchronized, and 3., both the excitation signals and the impulse responses of the enclosures are unknown.

Our FCN is designed to extract spectral and temporal patterns from stereo recordings, aggregate the temporal information over time-frames, and predict the likelihood of virtual sources corresponding to reflective surfaces being at specific locations.

Numerical experiments confirm that the network is able to generalize to mismatched microphone array sizes, sensor directivity patterns, or audio signal types, while highlighting front-back ambiguity as a prominent source of uncertainty.

When a single reflective surface is present, up to 80% of the sources are detected, while this figure approaches 50% in rectangular rooms.

Further tests on real-world recordings report similar accuracy as with artificially reverberated speech signals, validating the generalization capabilities of the framework.

Additional information ...

Overview of MSc SS Thesis Presentation