Clustering ensemble selection: A systematic mapping study

سال انتشار: 1402
محل انتشار: مجله آنالیز غیر خطی و کاربردها، دوره: 14، شماره: 9
کد COI اختصاصی: JR_IJNAA-14-9_015
زبان مقاله: انگلیسی
تعداد مشاهده: 299

دانلود فایل این مقاله

نویسندگان

Hajar Khalili

Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

Mohsen Rabbani

Department of Applied Mathematics, Sari Branch, Islamic Azad University, Sari, Iran

Ebrahim Akbari

Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

چکیده

Clustering has emerged as an important tool for data analysis, which can be used to produce high-quality data partitions as well as stronger and more accurate consensus clustering based on basic clustering. Data item labels, which are already known as opposed to classification issues, are unlabeled clusters in unsupervised clustering, which may cause uncertainty in large libraries. Therefore, all clusters produced are not useful for the final clustering solution. To address this challenge, instead of selecting all of them from a subset of variants to combine for the obtainment of the final result, Clustering ensemble selection (CES) was proposed in ۲۰۰۶ by Hadjitodorov. The goal is the selection of a subset of large libraries to produce a smaller cluster offering higher-quality performance. (CES) has been found effective in the improvement of the clustering solutions quality. The current paper conducts a systematic mapping study (SMS) for the analysis and synthetization of the studies formerly conducted on the CES techniques. To this end, ۴۲ prominent publications from the existing literature, published from ۲۰۰۶ to August ۲۰۲۲, were selected to be examined in this article. The analysis results showed that most of the articles have used the NMI measure to evaluate the cluster quality, and the method of valuing the initial parameter has been more commonly used for the generation of diversity. Clustering ensemble selection has not been done on text yet; in addition, the trade-off between diversity and quality (considering both at the same time) can be studied and evaluated in the future.

کلیدواژه ها

Clustering Ensemble Selection, Diversity, Measure, Consensus Function

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.