Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification

  • سال انتشار: 1396
  • محل انتشار: مجله بین المللی ارتباطات و فناوری اطلاعات، دوره: 9، شماره: 4
  • کد COI اختصاصی: JR_ITRC-9-4_005
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 324
دانلود فایل این مقاله

نویسندگان

MohammadReza Keyvanpour

Mahnoosh Kholghi

Sogol Haghani

چکیده

Most of the data stream classification methods need plenty of labeled samples to achieve a reasonable result. However, in a real data stream environment, it is crucial and expensive to obtain labeled samples, unlike the unlabeled ones. Although Active learning is one way to tackle this challenge, it ignores the effect of unlabeled instances utilization that can help with strength supervised learning. This paper proposes a hybrid framework named “DSeSAL”, which combines active learning and dynamic self-training to achieve both strengths. Also, this framework introduces variance based self-training that uses minimal variance as a confidence measure. Since an early mistake by the base classifier in self-training can reinforce itself by generating incorrectly labeled data, especially in multi-class condition. A dynamic approach to avoid classifier accuracy deterioration, is considered. The other capability of the proposed framework is controlling the accuracy reduction by specifying a tolerance measure. To overcome data stream challenges, i.e., infinite length and evolving nature, we use the chunking method along with a classifier ensemble. A classifier is trained on each chunk and with previous classifiers form an ensemble of M such classifiers. Experimental results on synthetic and real-world data indicate the performance of the proposed framework in comparison with other approaches.

کلیدواژه ها

Computer Science, Data Mining, Semi-supervised learning, Classification, Data Stream.

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.