Clustering a Big Mobility Dataset Using an Automatic Swarm Intelligence-Based Clustering Method

  • سال انتشار: 1397
  • محل انتشار: مجله نوآوری های مهندسی برق و کامپیوتر، دوره: 6، شماره: 2
  • کد COI اختصاصی: JR_JECEI-6-2_011
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 471
دانلود فایل این مقاله

نویسندگان

Iman Behravan

Department of Electrical Engineering, PhD student, University of Birjand, i.behravan@birjand.ac.ir

Seyed Hamid Zahiri

Department of Electrical Engineering, Faculty of Engineering, University of Birjand, hzahiri@birjand.ac.ir

Seyed Mohammad Razavi

Department of Electrical Engineering, Faculty of Engineering, University of Birjand,

Roberto Trasarti

KDD lab, ISTI-CNR, Pisa, Italy, roberto.trasarti@isti.cnr.it

چکیده

Big data referred to huge datasets with high number of objects and high number of dimensions. Mining and extracting big datasets is beyond the capability of conventional data mining algorithms including clustering algorithms, classification algorithms, feature selection methods and etc. Clustering, which is the process of dividing the data points of a dataset into different groups (clusters) based on their similarities and dissimilarities, is an unsupervised learning method discovers useful information and hidden patterns from raw data. K-means yet is an efficient clustering algorithm but it suffers from some drawbacks. It has a tendency to converge to a local optimum point, its output result depends on its initial value of cluster centers and it is unable in finding the number of clusters. In this research a new clustering method for big datasets is introduced based on Particle Swarm Optimization (PSO) algorithm. PSO is a heuristic algorithm with high ability in searching the solution space and finding the global optimum point. The proposed method is a two-stage algorithm which first searches the solution space for proper number of clusters and then searches to find the position of the centroids. Its performance is evaluated on 13 synthetics and a biological microarray dataset. Finally, 2 real big mobility datasets, are investigated and analyzed using the proposed clustering method.

کلیدواژه ها

Big data clustering, Bobility dataset, K-means, Swarm intelligence, Particle swarm optimization

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.