Structure-Aware Initialization for K-Means Clustering: An IQR-Weighted Approach to Mitigate Outlier Impact

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 8

فایل این مقاله در 5 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

CEITCONF09_023

تاریخ نمایه سازی: 24 خرداد 1405

چکیده مقاله:

K-Means algorithm remains a basic algorithm within the field of data mining, and its linear time complexity has made it extremely popular. However, its performance has remained sensitive to seed point selection, often ending up at local optima. Although probabilistic seed selection using K-Means++ is theoretically more effective, its results are still stochastic, thus requiring higher computations due to multiple passes over the dataset. Moreover, traditional deterministic seed selection does not consider the divergent discriminatory capabilities of features while handling higher dimensional datasets, thus considering noise and signal features equivalently. This paper proposes a new deterministic seed selection algorithm called IQR Weighted Initializer, where Interquartile Range values are used to weigh feature importance while choosing seed points. By assigning higher importance values to structural features with large dispersal, the algorithm suppresses the effects of outliers. Experimentation on ۱۰ UCI datasets shows that the proposed algorithm performs better than current best deterministic and hierarchical algorithms. Moreover, on datasets where outliers are pertinent, like Glass Identification, the algorithm shows a reduction of approximately ۷% Sum of Squared Errors over Bisecting KMeans, thus preventing the algorithm from becoming stuck at local optima, where hierarchical algorithms fail. Additionally, Silhouette Score and Adjusted Rand Index validation shows that the algorithm groups features into better-structured classes, closer to actual labels.

نویسندگان

Mohammad Hamzeei

Department of Computer Engineering Birjand University of Technology Birjand, Iran

Mostafa Sabzekar

Department of Computer Engineering Birjand University of Technology Birjand, Iran