Feature Selection in High Dimensional Datasets based on Adjacency Matrix

Negin Bagherpour; Behrang Ebrahimi

Feature Selection in High Dimensional Datasets based on Adjacency Matrix

محل انتشار: نشریه علم داده و مدل سازی، دوره: 2، شماره: 1

سال انتشار: 1402

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 150

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2122122

شناسه ملی سند علمی:

JR_JCSM-2-1_011

تاریخ نمایه سازی: 6 آذر 1403

چکیده مقاله:

Feature selection is crucial to improve the quality of classification and clustering. It aims to enhance machine learning performance and reduce computational costs by eliminating irrelevant or redundant features. However, existing methods often overlook intricate feature relationships and select redundant features. Additionally, dependencies are often hidden or inadequately identified. That’s mainly because of nonlinear relationships being used in traditional algorithms. To address these limitations, novel feature selection algorithms are needed to consider intricate feature relationships and capture high-order dependencies, improving the accuracy and efficiency of data analysis.In this paper, we introduce an innovative feature selection algorithm based on Adjacency Matrix, which is applicable to supervised data. The algorithm comprises three steps for identifying pertinent features. In the first step, the correlation between each feature and its corresponding class is measured to eliminate irrelevant features. Moving to the second step, the algorithm focuses on the selected features, calculates pairwise relationships and constructs an adjacency matrix. Finally, the third step employs clustering techniques to classify the adjacency matrix into k clusters, where k represents the number of desired features. From each cluster, the algorithm selects the most representative feature for subsequent analysis.This feature selection algorithm provides a systematic approach to identify relevant features in supervised data, thereby significantly enhance the efficiency and accuracy of data analysis. By taking into account both the linear and nonlinear dependencies between features and effectively detecting them across multiple feature sets, it successfully overcomes the limitations of previous methods.

کلیدواژه ها:

Adjacency matrix ، Feature Selection ، Mutual information ، High dimension

نویسندگان

Negin Bagherpour

Department of Engineering Sciences, Faculty of Engineering, University of Tehran, Tehran, Iran

Behrang Ebrahimi

Department Of Engineering Sciences, University Of Tehran