PTM: Part-Of-Speech Tagger for Persian clinical notes Based on Hidden Markov Model
محل انتشار: اولین کنگره بین المللی هوش مصنوعی در علوم پزشکی
سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 195
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
AIMS01_027
تاریخ نمایه سازی: 1 مرداد 1402
چکیده مقاله:
Background and aims: Part of Speech tagging is an essential part of clinical text processingapplications. Complications with part-of-speech (POS) tagging of clinical texts is accessing andannotating appropriate training corpora. These difficulties may result in POS taggers trained oncorpora that differ from the tagger’s target clinical notes which will result in low tagging accuracy.This paper presents the Persian Part of Speech (POS) tagger, based on the Hidden Markov Modelsfor Persian clinical notes. The proposed tagger (PTM) supports some properties of text to speechsystems, such as Break Phrase Detection, Homograph words Disambiguation, and Lexical StressSearch and the main aspects of Persian morphology is introduced and developed. In order to haveevaluation about the accuracy of the proposed approach, it is applied on both formal and informalclinical corpus. The experimental results show an overall accuracy of ۹۰.۳%, which is the bestresult reported for Persian POS medical texts.Method: PTM estimates a tag’s likelihood for a given token by combining token collocationprobabilities and the token’s tag probabilities calculated using a Naive Bayes classifier. We comparedPTM to three POS taggers used in the medical domain (mxpost, Brill and TnT). We trainedeach tagger on a non-clinical corpus and evaluated it on clinical corpora.Results: To evaluated proposed method, two different experiments were performed. Firstly, weapplied PTM on the formal clinical text. Hence, the emergency part of the corpus is selected. Thispart of corpus has ۱۳۹۲۴ words in which ۱۲۵۶۲ of them are known words. We also applied PTMon informal clinical text. This part of corpus has ۱۰۳۲۹ words in which ۱۰۰۶۲ of them are knownwords. PTM was more accurate in clinical text tagging than mxpost, Brill and TnT (respectiveaverages ۸۳.۹, ۸۱.۰, ۷۹.۵ and ۷۸.۸).Conclusion:Analyzing of tagger performance illustrates the lexical differences between corporahave profound effect on tagging accuracy than originally considered by related studies. ClinicalPOS tagging methods may be improved to advance their accuracy without requiring extra trainingor large training data sets. In this paper, we proposed a Persian clinical POS tagger basedon HMM and an optimization process is suggested. Moreover, some main challenges of POStagging systems introduced. High accuracy rate of PTM in compared with other related methodsdemonstrates that HMM Models are suitable for POS tagging in Persian clinical texts. Our futurework will focus on improving the accuracy Persian clinical Natural language processing (NLP)by providing a multi- approach system.
کلیدواژه ها:
نویسندگان
Morteza Okhovvat
Neuroscience Research Center, Faculty of ParaMedicine, Golestan University of Medical Sciences, Gorgan, Iran.(*m-okhovat@goums.ac.ir)
Saeed Gol Furouzi
Deprtment of Emergency Medicine, Golestan University of Medical Sciences, Gorgan, Iran
Sayed Babak Mojaver Aghili
Department of Anesthesiology & Intensive care, faculty of medicine, Golestan University of medical sciences, Gorgan, Iran.