A x-vector based Speaker Recognition in Persian

fatemeh shahbakhti; maryam Moradi-Shabestari; Zeinab Ghasemi-Naraghi

A x-vector based Speaker Recognition in Persian

محل انتشار: مجله مهندسی و تحقیقات کاربردی، دوره: 1، شماره: 2

سال انتشار: 1403

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 101

فایل این مقاله در 13 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2130874

شناسه ملی سند علمی:

JR_EAR-1-2_007

تاریخ نمایه سازی: 20 آذر 1403

چکیده مقاله:

In this paper, a text-independent speaker recognition system in Persian is implemented by deep neural networks. The x-vector technique based on Time Delay Neural Network (TDNN) is used to extract the embeddings from speech signals. This method attracts researcher’s attention due to noise robustness and high performance. Data augmentation and noise addition are used to improve system performance. The PLDA classifier is used to recognize the speaker. Previous research in the field of “speaker recognition in Persian” is limited. In this work, the network is trained on the Persian part of the CommonVoice dataset. According to the error analysis, non-speech parts of an utterance decrease the accuracy of speaker recognition. So, the non-speech parts are removed by a Convolutional Recurrent Deep Neural Networks (CRDNN). The accuracy of speaker recognition and verification in CommonVoice is ۹۵.۲۴% and ۹۵.۵۶%, respectively. The Equal Error Rate (EER) evaluation metric of the speaker verification system is ۴.۷۲%. The attendance monitoring system was developed as one of the applications of the speaker recognition system. System accuracy for ۱۲ and ۱۵ seconds of collected data(includes ۱۶ women and ۱۲ men) is ۹۸.۹۲% and ۱۰۰%, respectivly.

کلیدواژه ها:

deep neural networks ، speaker recognition ، x-vector ، Persian language

نویسندگان

fatemeh shahbakhti

Department of Electrical and Computer Engineering, Faculty of Shariaty, Skill National University (nus), Tehran, Iran

maryam Moradi-Shabestari

Electrical and Computer Engineering Department, Tehran University, Tehran, Iran

Zeinab Ghasemi-Naraghi

Computer Engineering Department, AmirKabir University of Technology, Tehran, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Furui, S. (۱۹۹۶). An Overview of Speaker Recognition Technology. In ...
Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & ...
Okabe, K., Koshinaka, T., & Shinoda, K. (۲۰۱۸, September ۲-۶). ...
Mohammad Amini, M., & Matrouf, D. (۲۰۲۱, January ۱۸-۲۱). Data ...
VoxCeleb. (n.d.). VoxCeleb: Large-scale audio-visual datasets of human speech. https:/ ...
Openslr. (n.d.). LibriSpeech ASR corpus. https://www.openslr.org/۱ ...
Nist. (۲۰۱۶, August ۴). Speaker Recognition Evaluation ۲۰۱۶. https://www.nist.gov/ system/files/documents/۲۰۱۶/۱۰/۰۷/sre۱۶_eval_plan_v۱.۳.pd ...
Hom, K. L., Beigi, H., & Betti, R. (۲۰۲۲). Application ...
Reynolds, D. A., & Rose, R. C. (۱۹۹۵). Robust text-independent ...
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. ...
Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (۲۰۰۷). ...
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., & Khudanpur, ...
Kanagasundaram, A., Sridharan, S., Ganapathy, S., Singh, P., & Fookes, ...
Jahangir, R., Teh, Y. W., Memon, N. A., Mujtaba, G., ...
Tripathi, M., Singh, D., & Susan, S. (۲۰۲۰). Speaker Recognition ...
Rouvier, M., Dufour, R., & Bousquet, P. M. (۲۰۲۱, January ...
Wu, Z., Wang, S., Qian, Y., & Yu, K. (۲۰۱۹, ...
Taherian, H., Wang, Z. Q., Chang, J., & Wang, D. ...
Kataria, S., Nidadavolu, P. S., Villalba, J., Chen, N., García-Perera, ...
Zeinali, H., Sameti, H., & Stafylakis, T. (۲۰۱۸, June ۲۶-۲۹). ...
Khoa, T. D., & Tsai, T. H. (۲۰۲۰, October ۳۰-۳۱). ...
Khosravani, A., & Homayounpour, M. M. (۲۰۱۷). A PLDA approach ...
CommonVoice. (۲۰۲۱). Datasets. https://commonvoice.mozilla.org/en/dataset ...
Ravanelli, M., Parcollet, T., Plantinga, P., Rouhe, A., Cornell, S., ...
Sagayam, K. M., Bruntha, P. M., Sridevi, M., Renith Sam, ...
Butterworth, S. (۱۹۳۰). On the theory of filter amplifiers. Wireless ...

نمایش کامل مراجع