Speech Recognition System Based on Machine Learning in Persian Language

In today's world, where speech recognition has become an integral part of our daily lives, the need for systems equipped with this technology has increased dramatically in the past few years. This research aims to locate the two selected Persian words in any given audio file. For this purpose, two standard and native datasets were prepared for this model one for train and the other for the test. Both datasets were converted into images of audio waveforms. Using the object detection technique, the model could extract different bounding boxes for each test audio, and then each box image goes through a CNN classifier and returns a corresponding label. Finally, a threshold is set so that only boxes with high accuracy are displayed as output. The results showed ۹۳% accuracy for the CNN classifier and ۵۰% accuracy for testing the model with object detection.

کلیدواژه ها:

Speech recognition ، Signal processing ، object detection ، Neural Network ، Deep Learning

نویسندگان

Shahed Mohammadi

Department of Computer Since and Systems Engineering, Ayandegan Institute of Higher Education, Tonekabon, Iran.

Niloufar Hemati

Department of Computer Science, Islamic Azad University Central Tehran Branch, Tehran, Iran.

Neda Mohammadi

Department of Industrial Engineering, Sadra University, Tehran, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Rudnicky, A. I., Hauptmann, A. G., & Lee, K. F. ...
Guo, J., & Gould, S. (۲۰۱۵). Deep CNN ensemble with ...
Vadwala, A. Y., Suthar, K. A., Karmakar, Y. A., Pandya, ...
Wang, Y., Mohamed, A., Le, D., Liu, C., Xiao, A., ...
Kriman, S., Beliaev, S., Ginsburg, B., Huang, J., Kuchaiev, O., ...
Chan, W., Park, D., Lee, C., Zhang, Y., Le, Q., ...
Park, S., Jeong, Y., & Kim, H. S. (۲۰۱۷). Multiresolution ...
Qian, Y., Bi, M., Tan, T., & Yu, K. (۲۰۱۶). ...
Han, S., Kang, J., Mao, H., Hu, Y., Li, X., ...
Gales, M. J. (۱۹۹۸). Maximum likelihood linear transformations for HMM-based ...
Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., ...
Cai, Z., & Vasconcelos, N. (۲۰۱۸). Cascade R-CNN: delving into ...
Ghiasi, G., Lin, T. Y., & Le, Q. V. (۲۰۱۹). ...

نمایش کامل مراجع

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1590252

شناسه ملی سند علمی:

JR_CAND-1-2_003

تاریخ نمایه سازی: 28 دی 1401

نحوه استناد به مقاله:

در صورتی که می خواهید در اثر پژوهشی خود به این مقاله ارجاع دهید، به سادگی می توانید از عبارت زیر در بخش منابع و مراجع استفاده نمایید:

Mohammadi, Shahed and Hemati, Niloufar and Mohammadi, Neda,1401,Speech Recognition System Based on Machine Learning in Persian Language,https://civilica.com/doc/1590252

در داخل متن نیز هر جا که به عبارت و یا دستاوردی از این مقاله اشاره شود پس از ذکر مطلب، در داخل پارانتز، مشخصات زیر نوشته می شود.
برای بار اول: (1401, Mohammadi, Shahed؛ Niloufar Hemati and Neda Mohammadi)
برای بار دوم به بعد: (1401, Mohammadi؛ Hemati and Mohammadi)
برای آشنایی کامل با نحوه مرجع نویسی لطفا بخش راهنمای سیویلیکا (مرجع دهی) را ملاحظه نمایید.

علم سنجی و رتبه بندی مقاله

مشخصات مرکز تولید کننده این مقاله به صورت زیر است:

رتبه علمی موسسه آموزش عالی آیندگان

نوع مرکز: موسسه غیرانتفاعی

تعداد مقالات: 509

در بخش علم سنجی پایگاه سیویلیکا می توانید رتبه بندی علمی مراکز دانشگاهی و پژوهشی کشور را بر اساس آمار مقالات نمایه شده مشاهده نمایید.