Proposed Feature Selection Method to Reduce Dimensions of Semantic Vectors

سال انتشار: 1398
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 433

متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICIKT10_043

تاریخ نمایه سازی: 5 بهمن 1398

چکیده مقاله:

Distributional semantic models (DSMs) represent semantics efficiently in natural language processing. The models are able to represent meaning using distributional information. DSMs suppose that the words which appear in similar contexts are semantically similar. Distributional semantic models represent the meaning of words as vectors whose values have been learned from corpus by looking at what other words occur with it in a context. The meaning distribution of a word is represented by a vector in the vector space that the basis vectors come from the context. The basis vectors used in such models are the most frequent words in a corpus. It is necessary to reduce high dimensional basis vectors since computational constraints. This research proposes a framework for the reduction of context words by feature selection via comparison of distance matrices. In this framework, each reduced feature corresponding to a word in contrast to feature fusion methods like NMF. The proposed method selected 500 and 1000 context words as basis vectors. Results show just about 1%-2% accuracy drop for MEN and SimLex-999 datasets. It shows that reduced basis vectors are efficient in calculating semantic vectors which preserve interpretability of context words

نویسندگان

Atefe Pakzad

School of Computer Engineering Iran University of Science and Technology Tehran, Iran

Morteza Analoui

School of Computer Engineering Iran University of Science and Technology Tehran, Iran