Preserving data anonymization in clustering using bootstrapping based method

سال انتشار: 1398
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 574

متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICIKT10_067

تاریخ نمایه سازی: 5 بهمن 1398

چکیده مقاله:

Along with numerous advantages that big data offered, some challenge such as privacy appears more seriously. One of the known solutions for that challenge is Data anonymization. The anonymization techniques are typically based on data generalization. At the first sight, it may seem that high volume of big data make it easier to generalized (known as large crowd effect ), but the data are divided to the chunks by Map-Reduce process and large crowd effect advantage is underachieved. The problem rise from fact that the data chunk distribution is different from original data. In this work the bootstrapping based method is used to divide data in a way that the chunk distributions have more consistency to the original data. In the proposed method there is also no need to know the statistical feature of original data. In this paper the method is implemented for real dataset and the statistical features such as mean and standard deviation are calculated. The Earth Mover Distance (EMD) is also used for measuring the distribution distance between data clusters and whole data. The results show the high similarity in data distribution between data clusters and whole data

نویسندگان

Mehdi Hasaninasab

Faculty of new sciences and technologies University of Tehran Tehran, Iran

Mohammad Khansari

Faculty of new sciences and technologies University of Tehran Tehran, Iran