Voice Activity Detection using Clustering-based Method in Spectro-Temporal Features Space

N. Esfandian; F. Jahani bahnamiri; S. Mavaddati

Voice Activity Detection using Clustering-based Method in Spectro-Temporal Features Space

محل انتشار: مجله هوش مصنوعی و داده کاوی، دوره: 10، شماره: 3

سال انتشار: 1401

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 112

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1525742

شناسه ملی سند علمی:

JR_JADM-10-3_009

تاریخ نمایه سازی: 9 مهر 1401

چکیده مقاله:

This paper proposes a novel method for voice activity detection based on clustering in spectro-temporal domain. In the proposed algorithms, auditory model is used to extract the spectro-temporal features. Gaussian Mixture Model and WK-means clustering methods are used to decrease dimensions of the spectro-temporal space. Moreover, the energy and positions of clusters are used for voice activity detection. Silence/speech is recognized using the attributes of clusters and the updated threshold value in each frame. Having higher energy, the first cluster is used as the main speech section in computation. The efficiency of the proposed method was evaluated for silence/speech discrimination in different noisy conditions. Displacement of clusters in spectro-temporal domain was considered as the criteria to determine robustness of features. According to the results, the proposed method improved the speech/non-speech segmentation rate in comparison to temporal and spectral features in low signal to noise ratios (SNRs).

کلیدواژه ها:

Spectro-temporal Features ، Auditory Model ، Gaussian mixture model ، WK-means clustering ، Voice Activity Detection

نویسندگان

N. Esfandian

Department of Electrical Engineering, Qaemshahr Branch, Islamic Azad University, Qaemshahr, Iran.

F. Jahani bahnamiri

Department of Computer Engineering, Aryan Institute of Science and Technology, Babol, Iran.

S. Mavaddati

Department of Electrical Engineering, Faculty of Engineering and Technology, University of Mazandaran, Babolsar, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Z. H. Tan and N. Dehak, "rVAD: An unsupervised segment-based ...
T. Yoshimura, T. Hayashi, K. Takeda, and S. Watanabe, "End-to-end ...
J. Lee, Y. Jung, and H. Kim, "Dual Attention in ...
Y. G. Thimmaraja, B. Nagaraja, and H. Jayanna, "Speech enhancement ...
R. Makowski and R. Hossa,"Voice activity detection with quasi-quadrature filters ...
F. Liu and A. Demosthenous, "A Computation Efficient Voice Activity ...
A. K. Alimuradov, "Enhancement of Speech Signal Segmentation using Teager ...
T. H. Zaw and N. War, "The combination of spectral ...
H. Ghaemmaghami, B. J. Baker, R. J. Vogt, and S. ...
S. Graf, T. Herbig, M. Buck, and G. Schmidt, "Features ...
R. G. Bachu, S. Kopparthi, B. Adapa, and B. Barkana, ...
T. Kristjansson, S. Deligne, and P. Olsen, "Voicing features for ...
S. Endah, R. Kusumaningrum, S. Adhy, and R. Ulfattah, "Automatic ...
S. Sharma, A. Sharma, R. Malhotra, and P. Rattan, "Voice ...
H. Khalid, S. Tariq, T. Kim, J. H. Ko, and ...
F. Jia, S. Majumdar, and B. Ginsburg, "MarbleNet: Deep ۱D ...
M. Asadolahzade Kermanshahi, and M. M. Homayounpour, "Improving Phoneme Sequence ...
N. Esfandian, "Phoneme Classification using Temporal Tracking of Speech Clusters ...
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, ...
S. A. Shamma, M. Elhilali, and C. Micheyl, "Temporal coherence ...
N. Mesgarani, S. V. David, J. B. Fritz, and S. ...
N. Mesgarani, S. V. David, J. B. Fritz, and S. ...
N. Mesgarani, M. Slaney, and S. A. Shamma, "Discrimination of ...
N. Mesgarani, J. Fritz, and S. Shamma, "A computational model ...
N. Esfandian, F. Razzazi, and A. Behrad, "A clustering based ...
I. Zulfiqar, M. Moerel, and E. Formisano, "Spectro-temporal processing in ...
D. R. Ruggles, A. N. Tausend, S. A. Shamma, and ...
F. Z. Yen, M. C. Huang, and T. S. Chi, ...
K. Lu, W. Liu, P. Zan, S. V. David, J. ...
N. Esfandian, F. Razzazi, and A. Behrad, "A feature extraction ...
N. Esfandian, F. Razzazi, A. Behrad, and S. Valipour, "A ...
A. Varga and H. J. Steeneken, "Assessment for automatic speech ...
B. H. Prasetio, E. R. Widasari, and H. Tamura, "Automatic ...
Z. Ali and M. Talha, "Innovative method for unsupervised voice ...

نمایش کامل مراجع