A Two-Level Semi-supervised Clustering Technique for News Articles

سال انتشار: 1400
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 134

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_IJE-34-12_012

تاریخ نمایه سازی: 10 اردیبهشت 1401

چکیده مقاله:

The web and social media are overcrowded with news pieces in terms of amount and diversity. Document clustering is a useful technique that is widely used in organizing and managing data into smaller groups. One of the factors influencing the quality of clustering is the way documents are represented. Some traditional methods of document representation depend on word frequencies and create sparse and large-sized document vectors. These methods cannot preserve proximity information between documents. In addition, neural network-based methods that preserve proximity information suffer from poor interpretability. Conceptual text representation methods have overcome the shortcomings of previous methods, but semi-supervised text clustering does not currently use concept-based document representation. This paper presents a two-level semi-supervised text clustering method that uses labeled and unlabeled data simultaneously to achieve higher clustering quality. In the first level, documents are represented based on the concepts extracted from the raw corpus. Second, the semi-supervised clustering process applies unlabeled data to capture the overall structure of the clusters and a small amount of labeled data to adjust the center of the clusters. Experiments on the Reuters-۲۱۵۷۸ data collection show that the proposed model is superior to other semi-supervised approaches in both text classification and text clustering.

نویسندگان

S. M. Sadjadi

Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran

H. Mashayekhi

Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran

H. Hassanpour

Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • Forsati. R, Mahdavi. M, Kangavari. M, Safarkhani. B, "Web page ...
  • Bouras. C, Tsogkas. V, "A clustering technique for news articles ...
  • Karypis. M, Kumar. V, Steinbach. M, "A comparison of document ...
  • Bobadilla. J, Ortega. F, Hernando. A, Gutiérrez. A, "Recommender systems ...
  • Barzegar Nozari. R, Koohi. H, Mahmodi. E, "A Novel Trust ...
  • Djenouri. Y, Belhadi. A, Fournier-Viger. P, Lin. J, "Fast and ...
  • Joty. S, Carenini. G, Ng. R, "Topic segmentation and labeling ...
  • Li. Y, Guo. H, Zhang. Q, Gu. M, Yang. J, ...
  • Jacovi. A, Shalom. O, Goldberg. Y, "Understanding Convolutional Neural Networks ...
  • Le. Q, Mikolov. T, "Distributed Representations of Sentences and Documents", ...
  • Zhang. W, Yoshida. T, Tang. X, Wang. Q, "Text clustering ...
  • Cozman. F, Cesar Cirelo. M, "Semi-Supervised Learning of Mixture Models" ...
  • Luo. X, Liu. F, Yang. S, Wang. X, Zhou. Z, ...
  • Dara. R, Kremer. S, Stacey. D, "Clustering unlabeled data with ...
  • Zhang. W, Yang. Y, Wang. Q, "Using Bayesian regression and ...
  • Kim. HK, Kim. H, Cho. S, "Bag-of-concepts: Comprehending document representation ...
  • Deerwester. S, Dumais. S.T, Furnas. G.W, Landauer. T.K, Harshman. R, ...
  • Mikolov. T, Chen. K, Corrado. G, Dean. J, "Efficient estimation ...
  • Edara. D.C, Vanukuri. L.P, Sistla. V, Kolli. V.K.K, "Sentiment analysis ...
  • Dai. A.M, Olah. C, Le. Q, "Document Embedding with Paragraph ...
  • Jia. C, Carson. M.B, Wang. X, Yu. J, "Concept decompositions ...
  • Li. P, Mao. K, Xu. Y, Li. Q, Zhang. J, ...
  • Zhu. X.J, "Semi-Supervised Learning Literature Survey", (۲۰۰۵). http://digital.library.wisc.edu/۱۷۹۳/۶۰۴۴۴Basu. S, Bilenko. ...
  • Zhang. W, Tang. X, Yoshida. T, "TESC: An approach to ...
  • Li. P, Deng. Z, "Use of distributed semi-supervised clustering for ...
  • Gan. H, Fan. Y, Luo. Z, Zhang. Q, "Local homogeneous ...
  • Diaz-Valenzuela. I, Loia. V, Martin-Bautista. M.J, Senatore. S, Vila. M.A, ...
  • Lu. M, Zhao. X.J, Zhang. L, Li. F.Z, "Semi-supervised concept ...
  • Mikolov. T, Sutskever. I, Chen. K, Corrado. G, Dean. J, ...
  • Robertson. S, "Understanding inverse document frequency: On theoretical arguments for ...
  • Strehl. A, Ghosh. J, Mooney. R, "Impact of Similarity Measures ...
  • نمایش کامل مراجع