Feature Selection in High Dimensional Datasets based on Adjacency Matrix
محل انتشار: نشریه علم داده و مدل سازی، دوره: 2، شماره: 1
سال انتشار: 1402
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 93
فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_JCSM-2-1_011
تاریخ نمایه سازی: 6 آذر 1403
چکیده مقاله:
Feature selection is crucial to improve the quality of classification and clustering. It aims to enhance machine learning performance and reduce computational costs by eliminating irrelevant or redundant features. However, existing methods often overlook intricate feature relationships and select redundant features. Additionally, dependencies are often hidden or inadequately identified. That’s mainly because of nonlinear relationships being used in traditional algorithms. To address these limitations, novel feature selection algorithms are needed to consider intricate feature relationships and capture high-order dependencies, improving the accuracy and efficiency of data analysis.In this paper, we introduce an innovative feature selection algorithm based on Adjacency Matrix, which is applicable to supervised data. The algorithm comprises three steps for identifying pertinent features. In the first step, the correlation between each feature and its corresponding class is measured to eliminate irrelevant features. Moving to the second step, the algorithm focuses on the selected features, calculates pairwise relationships and constructs an adjacency matrix. Finally, the third step employs clustering techniques to classify the adjacency matrix into k clusters, where k represents the number of desired features. From each cluster, the algorithm selects the most representative feature for subsequent analysis.This feature selection algorithm provides a systematic approach to identify relevant features in supervised data, thereby significantly enhance the efficiency and accuracy of data analysis. By taking into account both the linear and nonlinear dependencies between features and effectively detecting them across multiple feature sets, it successfully overcomes the limitations of previous methods.
کلیدواژه ها:
نویسندگان
Negin Bagherpour
Department of Engineering Sciences, Faculty of Engineering, University of Tehran, Tehran, Iran
Behrang Ebrahimi
Department Of Engineering Sciences, University Of Tehran