PhD. Dissertation Defense: Emad Mounir Grais
  • FENS
  • PhD. Dissertation Defense: Emad Mounir Grais

You are here

INCORPORATING PRIOR INFORMATION IN NONNEGATIVE MATRIX FACTORIZATION FOR AUDIO SOURCE SEPARATION  

Emad Mounir Grais
Electronics Engineering, PhD Dissertation, 2013

Thesis Jury

Asst. Prof. Hakan Erdoğan (Thesis Supervisor), Prof. Dr. Mustafa Ünel, Assoc. Prof. Müjdat Çetin, Assoc. Prof. İlker Hamzaoğlu, Assoc. Prof. Dr. Ali Taylan Cemgil

Date &Time: June 07th, 2013 - 14:00

Place: FENS G029 

Abstract

In this work, we propose solutions to the problem of audio source separation from a single recording. The audio source signals can be speech, music or any other audio signals. We assume training data for the individual source signals that are present in the mixed signal are available. The training data are used to build a representative model for each source. In most cases, these models are sets of basis vectors in magnitude or power spectral domain. The proposed algorithms basically depend on decomposing the spectrogram of the mixed signal with the trained basis models for all observed sources in the mixed signal. Nonnegative matrix factorization (NMF) is used to train the basis models for the source signals. NMF is then used to decompose the mixed signal spectrogram as a weighted linear combination of the trained basis vectors for each observed source in the mixed signal. After decomposing the mixed signal, spectral masks are built and used to reconstruct the source signals.

In this thesis, we improve the performance of NMF for source separation by incorporating more constraints and prior information related to the source signals to the NMF decomposition results. The NMF decomposition weights are encouraged to satisfy some prior information that are related to the nature of the source signals. The priors are modeled using Gaussian mixture models and hidden Markov models to represent the priors of the valid weight combination sequences that the basis vectors can receive for a certain type of source signal. The prior models are incorporated with the NMF cost function using either log-likelihood or minimum mean squared error estimation (MMSE).

In this thesis, we also improve the NMF training for the basis models. In cases when enough training data is not available, we introduce different adaptation methods for the trained basis to better fit the sources in the mixed signal. We also improve the training procedures for the sources by learning more discriminative models for the source signals. In addition, to consider a larger context in the models, we concatenate neighboring spectra together and train basis sets from them instead of a single frame which makes it possible to directly model the relation between consequent spectral frames. Finally, we introduce post enhancement using MMSE estimation and post smoothing to obtain better separation for the source signals.