A New Framework for Data Reduction in Large-scale Data Using Mapreduce

سال انتشار: 1404
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 4

فایل این مقاله در 15 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JADM-13-4_005

تاریخ نمایه سازی: 5 مهر 1404

چکیده مقاله:

Storing and processing large volume datasets is one of the most critical problems in large-scale processing. Therefore, it is need to reduce their size before further processing. This paper is proposed a framework for data reduction in large-scale datasets. The proposed framework is based on MapReduce algorithm. It has three steps. Firstly, by reservoir sampling, some instances of a dataset are selected. In the second step, the features of these selected instances are weighted using ReliefF algorithm. Then, all weights are averaged for each feature and features with the highest weight values are selected. Finally, the selected features have been used in classification. Implementation results of the proposed framework show a good reduction of time. It also increases accuracy or maintains it when a large amount of data is removed by eliminating irrelevant features in classification algorithms.

نویسندگان

Zeinab Abbasi

Faculty of Engineering, Mahallat Institute of Higher Education, Mahallat, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • S. d. Río, V. Lopez, J. M. Benítez and F. ...
  • J. Derrac, S. Garcia and F. Herrera, "IFS-CoCo: Instance and ...
  • P. Bradley, U. Fayyad and C. Reina, " Clustering very ...
  • H. Liu, H. Motoda and L. Yu, "A selective sampling ...
  • W. G. Cochran, Sampling Techniques,۱st ed., New York: Wiley, ۱۹۷۷, ...
  • H. Liu and H. Motoda, Instance Selection and Construction for ...
  • J. R. Cano, F. Herrera and M. Lozano, "On the ...
  • M. Rashid, J. Kamruzzaman, T. Imam, S. Wibowo and S. ...
  • A. V. Turukmane and R. Devendiran, "M-MultiSVM: An efficient feature ...
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, ...
  • F. Li, Z. Zhang and C. Jin, "Feature selection with ...
  • J. Qian, P. Lv, X. Yue, C. Liu and Z. ...
  • X. Yu and X. Cai, "A multi-objective evolutionary algorithm with ...
  • Y. Lv, P. Liu, J. Wang, Y. Zhang, A. Slowik ...
  • H. Liu and L. Yu, "Feature selection for high-dimensional data: ...
  • L. MoránF., V. B. Canedo and A. A. Betanzos, "Centralized ...
  • C. Kai, W. W. qiang and L. Yun, "Differentially private ...
  • C. García-Osorio, A. d. Haro-García and N. G. Pedrajas, "Democratic ...
  • D. S. F. Isaac Triguero, "MRPR: A MapReduce solution for ...
  • G. E. Melo-Acosta, F. Duitama-Muñoz and J. D. Arias-Londoño, "An ...
  • C. Gong, Z.-g. Su, P.-h. Wang, Q. Wang and Y. ...
  • L. Qin, X. Wang and Z. Jiang, "A distributed evolutionary ...
  • D. Fragoudis, D. Meretakis and S. Likothanassis, "Integrating feature and ...
  • K. Yu, X. Xu, M. Ester and H.-P. Kriegel, "Feature ...
  • H. Ahn and K.-j. Kim, "Bankruptcy prediction modeling with hybrid ...
  • C.-F. Tsai, W. Eberle and C.-Y. Chu, "Genetic algorithms in ...
  • T. Chen, X. Zhang, S. Jin and O. Kim, "Efficient ...
  • Z.-H. You, Y.-H. Hu, C.-F. Tsai and Y.-M. Kuo, "Integrating ...
  • C. F. Tsai, K.-L. Sue, Y.-H. Hu and A. Chiu. ...
  • T. White, Hadoop, The Definitive Guide, ۳rd ed., USA: O’Reilly ...
  • J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on ...
  • Apache Software Foundation, "Apache Hadoop Project," ۲۰۱۳. [Online]. Available: <http://hadoop.apache.org/>. ...
  • D. Miner and A. Shook, MapReduce Design Patterns: Building Effective ...
  • J. S. vitter, "Random sampling with a reservoir," ACM Transactions ...
  • E. Š. M. R.-Š. Igor Kononenko, "Overcoming the myopia of ...
  • L. A. R. Kenji Kira, "The feature selection problem: Traditional ...
  • S. Das, "Filters, wrappers and a boosting-based hybrid for feature ...
  • I. K. Marko Robnik-Šikonja, "Theoretical and empirical analysis of ReliefF ...
  • G. Frederickson, "An Optimal Algorithm for Selection in a Min-Heap," ...
  • J. R. Quinlan, C۴.۵: Programs for Machine Learning, ۱st ed., ...
  • R. J. Hyndman and A. B. Koehler, "Another look at ...
  • S. Mii Rostami, and M. Ahmadzadeh, “Extracting predictor variables to ...
  • R. J. Hyndman, and A. B. Koehler,” Another look at ...
  • نمایش کامل مراجع