Skip to main content

Spatially Selective Audio Source Separation

This project combines digital signal processing and machine learning to separate human speakers using a custom 16-channel uniform circular array. First, a beamforming operation is applied to the recorded audio. Then, the transformed data is sent to a convolutional neural network (CNN) which predicts the ideal ratio mask (IRM) that filters out the unwanted source. The IRM is applied to the spectrograms with the largest signal power of the target speaker, which is found using a direction-of-arrival (DOA) calculation. Finally, an Inverse Short-Time Fourier Transform (STFT) is applied, and the resulting audio is downmixed and played through headphones.

Team Members:

Ayan Basu
Zach Hestand
Tanay Mannikar
Blake Schwartz
Alex Zhang

Semester