Text Anomalies Detection Using Histograms of Words
سال انتشار: 1395
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 316
فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_ACSIJ-5-1_010
تاریخ نمایه سازی: 19 آبان 1397
چکیده مقاله:
Authors of written texts mainly can be characterized by some collection of attributes obtained from texts. Texts of the same author are very similar from the style point of view. We can consider that attributes of a full text are very similar to attributes of parts in the same text. In the same thoughts can be compared different parts of the same text. In the paper, we describe an algorithm based on histograms of a mapped text to interval 0,1 . In the mapping, it is kipped the word order as in the text. Histograms are analyzed from a cluster point of view. If a cluster dispersion is not large, the text is probably written by the same author. If the cluster dispersion is large, the text will be split in two or more parts and the same analysis will be done for the text parts. The experiments were done on English and Arabic texts. For combined English texts our algorithmcovers that texts were not written by one author. We have got the similar results for combined Arabic texts. Our algorithm can be used to basic text analysis if the text was written by one author.
کلیدواژه ها:
نویسندگان
Abdulwahed Almarimi
Institute of Computer Science, Faculty of Science, P. J. Šafárik University in Košice ۰۴۰۰۱ Košice, Slovakia
Gabriela Andrejková
Institute of Computer Science, Faculty of Science, P. J. Šafárik University in Košice ۰۴۰۰۱ Košice, Slovakia