CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Social Media Toxic Content Filtering System using SOIR Model

عنوان مقاله: Social Media Toxic Content Filtering System using SOIR Model
شناسه ملی مقاله: JR_JITM-15-6_006
منتشر شده در در سال 1402
مشخصات نویسندگان مقاله:

Bhandari - Department of Applied Mathematics, Indore Institute of Engineering and Technology, Indore, India
Navalakhe - Department of Applied Mathematics and Computational Science, Shri G. S. Institute of Technology and Science, Indore, India.
Prajapati - Department of Applied Mathematics, Indore Institute of Engineering and Technology, Indore, India.

خلاصه مقاله:
Social media is a popular data source in the research community. It provides different opportunities to design practical applications to favor humanity and society. A significant amount of people consumes social media content. Thus, sometimes content promoters and influencers publish misleading and toxic content. Therefore, this paper proposes an unhealthy content filtering system using the information retrieval model SOIR to identify and remove poisonous content from social media. The Semantic query Optimization-based Information Retrieval (SOIR) uses Fuzzy C Means (FCM) clustering to produce a particular data structure. To incorporate a query generation technique for the generation of multiple queries to increase the probability of correct outcomes. The SOIR model is modified in this work to utilize the model with the social media toxic content filtering model. The model uses linguistic and semantically information to craft new feature sets. The Part of Speech (POS) tagging is used to construct the linguistic feature. Finally, the pattern-matching algorithm is designed to classify the tweets as toxic or nontoxic. Based on lexical and semantic analysis of similar semantic queries (Tweets), it is identified with the class labels of the tweets. Twitter text posts are used to create training and test samples in this context. Here, a total of ۲۰۰۲ tweets are used for the experiment. The experimental study has been carried out with the different I.R. models (K-NN, Cosine) based on precision, recall, and F۱-Score demonstrating the superiority of the proposed classification model

کلمات کلیدی:
Text mining, Semantic Knowledge, information retrieval, Sentiment analysis, Lexical Pattern Analysis

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1635717/