Mihir Kapadia

The thesis topic is: 1) how can we design a listening system / device to detect the (dimensional) emotion state of the last utterance spoken by a user to a voice assistant using only speech audio emotion and physiological signals? 2) given the interaction context (e.g., home), how does audio directional information, which is prone to distortion, contribute to recognition performance?