Efficient calculation of sentence semantic similarity: a proposed scheme based on machine learning approaches and NLP techniques

سال انتشار: 1393
محل انتشار: مجله علمی مروری، دوره: 3، شماره: 3
کد COI اختصاصی: JR_SJR-3-3_005
زبان مقاله: انگلیسی
تعداد مشاهده: 782

دانلود فایل این مقاله

نویسندگان

m roostaee

Department of Computer Science and Engineering and IT, School of Electrical Engineering and Computer, Shiraz, Iran.

s.m fakhrahmad

Department of Computer Science and Engineering and IT, School of Electrical Engineering and Computer, Shiraz, Iran.

m.h sadreddini

Department of Computer Science and Engineering and IT, School of Electrical Engineering and Computer, Shiraz, Iran.

a khalili

Department of Computer Science and Engineering and IT, School of Electrical Engineering and Computer, Shiraz, Iran.

چکیده

Aim of Study Sentence semantic similarity plays a crucial role in a variety of applications such as Machine Translation, Information Retrieval, Question Answering and Multi-document Summarization. Considering the variability of natural language expression, sentence semantic similarity detection is not a trivial task. This paper tries to make use of Natural Language Processing (NLP) as well as machine learning techniques in order to propose a scheme for sentence semantic similarity. Materials and Methods In the first part of the proposed scheme, i.e., the NLP section, different sets of linguistic features including string-based, semantic-based, Named Entity-based and syntax-based features are extracted. In the second part, machine learning algorithms are used to construct classification models on the extracted set of features. Results Experimental results in the first part indicate that extracted features are valid for sentence semantic similarity. Moreover, by comparing the performance of different classification algorithms in the second part, KNN seems to be the most successful algorithm. Overall conclusion Overall, experimental results indicate that the proposed approach can be used to improve the performance of sentence semantic similarity detection especially in terms of accuracy.

کلیدواژه ها

Sentence semantic similarity , Natural language processing , Machine learning classification

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.