CS-EE Seminar
  • FENS
  • CS-EE Seminar

You are here

Speaker: Hakan Erdogan (Sabanci University)
Title: Long short-term memory networks for single and multi-channel speech enhancement

Time: 12:40 -- 13:30
Place: FENS L027

Hakan Erdogan (Sabanci University)
Title: Long short-term memory networks for single and multi-channel speech enhancement
Deep learning approaches have recently been used in speech enhancement and source separation problems following their acclaimed success in speech recognition. Particularly, long short-term memory networks have achieved better performance in separating speech from background noise in comparison to nonnegative matrix factorization and deep neural networks. We present methods to further improve the performance of such systems. We discuss how a phase-sensitive loss function can be used in a mask prediction network to optimize signal to distortion ratio. We describe how enhancement and recognition goals can synergistically help each other. The developed enhancement methods help achieve significant recognition gains as compared to multi-condition training in a noise-robust speech recognition task. Similar ideas can be extended to multi-channel speech scenarios where single channel mask prediction networks can be used to improve minimum variance distortionless response beamforming. Another promising direction is to use neural networks to process multiple channels jointly to directly perform beamforming using a computational network. We will briefly discuss initial research done in these two directions during the 2015 Jelinek summer workshop.
Hakan Erdogan is a faculty member at Sabanci University in Istanbul, Turkey. He obtained his Ph.D. from University of Michigan where he worked on developing algorithms to speed up statistical image reconstruction methods for PET. He was with the Human Language Technologies group at IBM T.J. Watson Research Center, NY between 1999 and 2002 where he focused on speech recognition and spoken language understanding. He has recently visited Mitsubishi Electric Research Laboratories, Cambridge MA during 2014-15 where he worked on speech separation and enhancement problems. His current research interests are source separation, speech enhancement, speech and speaker recognition, sparse signal recovery and biometrics.