Advancing Feature Selection: A Hybrid Approach for High-Dimensional and Incomplete Data
سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 153
نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS12_078
تاریخ نمایه سازی: 12 آبان 1403
چکیده مقاله:
A recently encountered challenge in data science and more specifically in machine learningis growing amount of data. Exclusion of superfluous data and thereby focusing on essential variables,known as feature selection, proves vital for model performance optimization. This study undertakes athorough investigation into pivotal feature selection strategies. The benefits and limitations of eachmethod is clearly stated. An advanced methodology is also presented for tackling incomplete datasets,alongside introducing an innovative hybrid model that unites the Partial Mutual Information Criterion(PMIC), state-of-the-art null value completion strategies and neural network synergies to improvefeature selection processes. Finally, the suggested strategy is implemented in Python and numerical testresults are reported on a few randomly generated data sets and three well-known datasets breast_cancer,iris and diabetes. We evaluate both the similarity (of selected features to known important features) andthe accuracy of imputed data. The reported results confirm the efficiency of this hybrid algorithm infeature selection for large data sets suffering from incomplete data.
کلیدواژه ها:
نویسندگان
B Ebrahimi
Department of Engineering Sciences, University of Tehran, Tehran, Iran
N Bagherpour
Department of Engineering Sciences, University of Tehran, Tehran, Iran