A Min-Max Based Data Normalization Method Robust to The Existence of Abnormal Data

  • سال انتشار: 1403
  • محل انتشار: بیست و یکمین کنفرانس ملی مهندسی برق،کامپیوتر و مکانیک
  • کد COI اختصاصی: ECME21_087
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 292
دانلود فایل این مقاله

نویسندگان

Arvin Ekhlasi

M.S. in Mechanical Engineering at Shiraz University, Shiraz, Iran

Hossein Mohammadi

Associate Prof. of the department of Solid Mechanics at Shiraz University, Shiraz, Iran

چکیده

In recent years, the rapid growth of big data utilization in Machine Learning applications has increased the crucial role of data preprocessing in enhancing the accuracy of algorithms. Among these preprocessing steps, data normalization stands out for its ability to standardize data ranges, ensuring all features contribute equally to the learning process. This article compares the effectiveness of two frequently used data normalization methods, Min-Max and Z-score, across various datasets. Recent studies have shown that Min-Max normalization often outperforms Z-score in specific applications. However, a significant drawback of the Min-Max method is its susceptibility to abnormal data, which can distort the entire normalization process. This is because Min-Max normalization relies solely on the minimum and maximum values of the dataset, which may include outliers among other data points. To address this issue, the article proposes an innovative adaptation of the Min-Max normalization method designed to enhance robustness against abnormal data. This approach aims to retain the advantages of Min-Max normalization while mitigating its vulnerabilities in real-world datasets. Results show that this method effectively handles both lower and upper abnormal values while ensuring reliable transformation of data into the specified range. This improvement is set to strengthen the dependability and usefulness of Min-Max normalization, in real-world machine learning scenarios.

کلیدواژه ها

Abnormal data, Normalization, Min-Max, SCADA

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.