preserving data clustering with expectation maximization algorithm

  • سال انتشار: 1395
  • محل انتشار: فصلنامه سیستم های اطلاعاتی و مخابرات، دوره: 4، شماره: 3
  • کد COI اختصاصی: JR_JIST-4-3_006
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 441
دانلود فایل این مقاله

نویسندگان

leila jafar tafreshi

Department of Electrical and Computer Engineering, Semnan University, Semnan, Iran

Farzin Yaghmaee

Department of Electrical and Computer Engineering, Semnan University, Semnan, Iran

چکیده

Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and information security. Therefore, a new class of data mining methods called privacy preserving data mining (PPDM) has been developed. The aim of researches in this field is to develop techniques those could be applied to databases without violating the privacy of individuals. In this work we introduce a new approach to preserve sensitive information in databases with both numerical and categorical attributes using fuzzy logic. We map a database into a new one that conceals private information while preserving mining benefits. In our proposed method, we use fuzzy membership functions (MFs) such as Gaussian, P-shaped, Sigmoid, S-shaped and Z-shaped for private data. Then we cluster modified datasets by Expectation Maximization (EM) algorithm. Our experimental results show that using fuzzy logic for preserving data privacy guarantees valid data clustering results while protecting sensitive information. The accuracy of the clustering algorithm using fuzzy data is approximately equivalent to original data and is better than the state of the art methods in this field.

کلیدواژه ها

Privacy Preserving; Clustering; Data Mining; Expectation Maximization Algorithm

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.