NMF-based Cepstral Features for Speech Emotion Recognition

سال انتشار: 1397
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 423

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

SPIS04_059

تاریخ نمایه سازی: 16 اردیبهشت 1398

چکیده مقاله:

Speech Emotion Recognition (SER) has received growing attention in recent years. For this purpose, various methods have been proposed. Feature extraction is the major part of SER methods and has conventionally done according to parametric representations that were specifically developed for speech signals, like Mel Frequency Cepstral Coefficients (MFCC). The discrimination abilities of the aforementioned features for SER task could be improved with the aid of the vocal production mechanisms of speakers at different emotional states. In this paper, new feature extraction scheme for SER is proposed that integrates this particular information through the decomposition of emotional speech spectrums and providing an improved spectral representation of various emotions. By employing this scheme, two novel methods are represented. In the first method, filter bank that is automatically learned by Non-negative Matrix Factorization (NMF) technique on emotional speech spectrums, has been used to extract cepstral like features. In the second method, the features are straightly derived from the activation coefficients of the spectrum decomposition as achieved by NMF. Finally, to increase the discrimination abilities of features among emotion classes, each of the feature vectors is normalized to its mean value. According to experiments on Emo-DB database, fusion of the proposed features with MFCCs outperforms the performance of an SER system compared with conventional MFCC as the baseline or the simple unsupervised NMF-based features derived from the speech spectrums.

نویسندگان