A Fuzzy based Very Low Bit Rate Speech Coding with High Accuracy

Milad Johnny; Javad Mirzaee

A Fuzzy based Very Low Bit Rate Speech Coding with High Accuracy

محل انتشار: بیستمین کنفرانس مهندسی برق ایران

سال انتشار: 1391

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 1,633

فایل این مقاله در 5 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/154534

شناسه ملی سند علمی:

ICEE20_322

تاریخ نمایه سازی: 14 مرداد 1391

چکیده مقاله:

According to the U.S. Federal Standard coder for 2400 bps, a data frame containing 54 bits of encoded signals are transmitted every 22.5 (ms). In each frame, 25 bits encodethe spectral features (10 Line Spectrum Frequencies (LSF)). This paper describes a method to reduce the transmission rate while preserving most of the quality and intelligibility. The performance of the proposed coder is at about 780 bits/sec ( = 6 bits/frame × 130 frames/sec). In transmitter, we apply an algorithm to convert speech in to phonetic segments, and then these segments are bifurcated in to the voiced and unvoicedsegments. Because of the fact that the spelling time of unvoiced phonetics is short, one cannot distinguish who is pronouncingthem, either a male or a female. Literatures in this context show that in most cases, the aforementioned observation is admitted. Therefore, for high accuracy speech transmission, voiced phonetics are more important than unvoiced ones. Hence, a Voiced/Unvoiced decomposition system is proposed.Furthermore, in order to cluster voice segments, fuzzy clustering is applied, in which the proper number of voicesegments is determined by a means of statistical method called ‘‘Elbow’’. Depending on the transmission rate, two different strategies can be utilized. In the first strategy, unvoicedsegments of speech can be transmitted by the use of Linear Predictive Coding (LPC) for high quality (MOS=4.5). As a second, unvoiced segments of speech can be recognized and then transmitted for lower quality (MOS=3) and under 100 bits/sec.

کلیدواژه ها:

Voice/unvoiced decomposition ، Speech coding ، frequency formant ، Fuzzy segmentation

نویسندگان

Milad Johnny

Iran University of Science and Technology

Javad Mirzaee

University of Ontario Institution of Technology (UOIT