Efficient Incorporation of PLSA and LDA Semantic Knowledge in Statistical Language Model Adaptation for Persian ASR Systems
سال انتشار: 1393
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 458
فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_IJOCIT-2-4_003
تاریخ نمایه سازی: 16 فروردین 1395
چکیده مقاله:
Language models (LMs) are important tools for especially ASR systems to improve their efficiency. Development of robust spoken language model ideally relies on the availability of large amounts of data preferably in the target domain and language. However, more often than not, speech developers need to cope with very little or no data, typically obtained from a different target domain. Language models are very brittle when moving from one domain to another. Language model adaptation is achieved by combining a generic LM with a topic-specific model that is more relevant to the target domain. We review a two major topic-based generative language model techniques designed to gain semantic knowledge of text. We show that applying a tf-idf-related per-word confidence metric, and using unigram rescaling rather than linear combinations with N-grams produces a more robust language model which has a significant higher accuracy on FARSDAT test set than a baseline N-gram model
کلیدواژه ها:
نویسندگان
Seyed Mahdi Hoseini
Computer Department of Shafagh University Tonekabon
Behrouz Minaei
Computer Department of Iran University of Science & Technology Tehran