Temporal and Spatial Features for Visual Speech Recognition

سال انتشار: 1396
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 412

فایل این مقاله در 8 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

COMCONF05_473

تاریخ نمایه سازی: 21 اردیبهشت 1397

چکیده مقاله:

Speech recognition from visual data is in important step towards communication when audio is not available. This paper considers several hand crafted features including HOG, MBH, DCT, LBP, MTC, and their combinations for recognizing speech from a sequence of images. Several classifiers including SVM, decision trees, K -nearest neighbor algorithm and the sub-space K-nearest algorithm were tested feature evaluation. Further, the application of PCA for dimensionality reduction was considered in this study. Two sets of tests were carried out in this study: lip pose recognition and recognition of isolated words. For evaluation, the MIRACL-VC1 data set was considered. Self -dependent tests reached an accuracy of over 95% while in the self-independent tests, the maximum accuracy of recognition was about 52%.

نویسندگان

Ali Jafari Sheshpoli

Cyber space research inst., Shahid Beheshti University, Tehran, Iran

Ali Nadian-Ghomsheh

Cyber space research inst., Shahid Beheshti University, Tehran, Iran