Automatic Synset Extraction from Text Documents Using a Graph-Based Clustering Approach via Maximal Cliques Finding
سال انتشار: 1397
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 281
فایل این مقاله در 9 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_ITRC-11-1_004
تاریخ نمایه سازی: 23 بهمن 1399
چکیده مقاله:
Semantic relations between words like synsets are used in automatic ontology production which is a strong tool in many NLP tasks. Synset extraction is usually dependent on other languages and resources using techniques such as mapping or translation. In our proposed method, synsets are extracted merely from text and corpora. This frees us from the need for special resources including Word-Nets or dictionaries. The representation model for words of corpus is based on Vector Space model and the most similar words to each are extracted based on common features count (CFC) using a modified cosine similarity measure. Furthermore, a graph-based soft clustering approach is applied to create clusters of synonymous words.
To examine performance of the proposed method, Extracted synsets were compared to other Persian semantic resources. Results show an accuracy of 80.25%, which indicates improvement in comparison to the 69.5% accuracy of pure clustering by committee method.
کلیدواژه ها:
نویسندگان
Mahsa Khorasani
School of Computer Engineering Iran University of Science and Technology Tehran, Iran
Behrouz Minaei-Bidgoli
School of Computer Engineering Iran University of Science and Technology Tehran, Iran
Chakaveh Saedi
Faculty of science, Engineering Dept Macquarie University Sydney, Australia