Robust Persian Digit Recognition in Noisy Environments Using Hybrid CNN-BiGRU Model

سال انتشار: 1404
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 127

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JADM-13-3_006

تاریخ نمایه سازی: 12 شهریور 1404

چکیده مقاله:

Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, particularly for phonetically similar numbers. A hybrid model combining residual convolutional neural networks and bidirectional gated recurrent units (BiGRU) is proposed, utilizing word units instead of phoneme units for speaker-independent recognition. The FARSDIGIT۱ dataset, augmented with various approaches, is processed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction. Experimental results demonstrate the model’s effectiveness, achieving ۹۸.۵۳%, ۹۶.۱۰%, and ۹۵.۹۲% accuracy on training, validation, and test sets, respectively. In noisy conditions, the proposed approach improves recognition by ۲۶.۸۸% over phoneme unit-based LSTM models and surpasses the Mel-scale Two Dimension Root Cepstrum Coefficients (MTDRCC) feature extraction technique along with MLP model (MTDRCC+MLP) by ۷.۶۱%.

کلیدواژه ها:

نویسندگان

Ali Nasr-Esfahani

Department of Electrical and Computer Engineering, Qom University of Technology, Iran.

Mehdi Bekrani

Department of Electrical and Computer Engineering, Qom University of Technology, Iran.

Roozbeh Rajabi

Department of Electrical and Computer Engineering, Qom University of Technology, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • A. S. Dhanjal and W. Singh, “A comprehensive survey on ...
  • H. Veisi and A. H. Mani, “Persian speech recognition using ...
  • M. S. Zandi and R. Rajabi, “Deep learning based framework ...
  • A. Kavand and M. Bekrani, “Speckle noise removal in medical ...
  • D. Yu and L. Deng, Automatic speech recognition: A deep ...
  • M. H. Rahimi Pour, N. Rastin, and M. M. Kermani, ...
  • M. M. Homayounpour, J. Kabudian, H. Bashiri, and Z. Ahmadpour, ...
  • M. M. Homayounpour, “FarsDigits database,” in Technical Report, Laboratory for ...
  • J. V. Doremalen and L. Boves, “Spoken digit recognition using ...
  • N. Hammami, M. Bedda, N. Farah, and R. O. Lakehal-Ayat, ...
  • D. Dhanashri and S. B. Dhonde, “Isolated word speech recognition ...
  • R. G. Leonard and G. Doddington, “TIDIGITS dataset,” Linguistic Data ...
  • B. Zada and R. Ullah, “Pashto isolated digits recognition using ...
  • S. Tabibian, “Robust Persian isolated digit recognition based on LSTM ...
  • S. M. Hoseini, “Recognition of Persian digits from zero to ...
  • J. Oruh, S. Viriri, and A. Adegun, “Long short-term memory ...
  • C. Amadeus, I. Syafalni, N. Sutisna, and T. Adiono, “Digit ...
  • B. Paul and S. Phadikar, “A hybrid feature-extracted deep CNN ...
  • A. A. Ramadan and K. M. Ezzat, “Spoken digit recognition ...
  • Z. Jakobovski, “Free spoken digit dataset.” github.com, Aug. ۲۰۲۰, [Online]. ...
  • K. Lounnas, M. Lichouri, and M. Abbas, “Analysis of the ...
  • T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and ...
  • F. Mahdavi, H. Zayyani, and R. Rajabi, “RSS localization using ...
  • W. Hartmann, T. Ng, R. Hsiao, S. Tsakalidis, and R. ...
  • D. S. Park, W. Chan, Y. Zhang, C. C. Chiu, ...
  • M. Sithu, “Audio Noise Dataset,” kaggle.com, Kaggle, ۲۰۱۹. [Online]. Available: ...
  • D. Amodei, et al., “Deep speech ۲: End-to-end speech recognition ...
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep ...
  • S. H. S. Basha, S. R. Dubey, V. Pulabaigari, and ...
  • Q. Tao, F. Liu, Y. Li, and D. Sidorov, “Air ...
  • A. Zakir, et al. “Database development and automatic speech recognition ...
  • نمایش کامل مراجع