Gasser Elbanna

Satrajit Ghosh, Ph.D.

Department of Otolaryngology, Head and Neck Surgery, Massachusetts Eye and Ear

Project Title: Evaluating voice identity perception in humans and deep learning models

Project Summary: Efforts to highlight salient acoustic parameters for identity perception were inefficacious. Conversely, we can relatively characterize specific acoustic correlates when studying the perception of other social signals (e.g., gender, demographics, emotions, etc.). For instance, fundamental and formant frequencies are essential parameters for the perception of voice gender. Voice identity processing has two main percepts; discriminating between different voices in similar instances or identifying the same speaker in various instances. We perform these tasks daily, however, between/within-speaker variability plays a vital role in our performance. Also, we have recently witnessed an ample rise in self-supervised models that generate robust speech representations without access to training labels. These representations reflect models' perception that is comparable to humans' perception. Accordingly, we aim to explore both perceptual spaces to understand the mechanisms of identity processing. Thus, the project comprises three folds (Computational, Behavioral, and Neuroimaging); the first fold elucidates the invariances and limitations of handcrafted and self-supervised models in speaker recognition tasks. Then, we conduct behavioral experiments on humans to compare the performance of humans and self-supervised models in voice discrimination tasks and explore their perceptual/encoding spaces. Lastly, the final fold encompasses mapping the speaker embeddings of models to brain activations using fMRI images to examine hierarchical brain-model correspondence.

Fellowships
Exchange Fellows
Fellowship Year
2021-2022

Gasser Elbanna

Fellowships

Fellowship Year