A Novel Two-Step Classification Approach for Runtime Performance Improvement of Duplicate Bug Report Detection

  • سال انتشار: 1402
  • محل انتشار: مجله مهندسی کامپیوتر و دانش، دوره: 6، شماره: 1
  • کد COI اختصاصی: JR_CKE-6-1_001
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 62
دانلود فایل این مقاله

نویسندگان

Behzad Soleimani Neysiani

Department of Software Engineering, University of Kashan, Kashan, Iran.

Seyed Morteza Babamir

Department of Software Engineering, University of Kashan, Kashan, Iran.

چکیده

Duplicate Bug Report Detection (DBRD) is one of the famous problems in software triage systems like Bugzilla. There are two main approaches to this problem, including information retrieval and machine learning. The second one is more effective for validation performance. Duplicate detection needs feature extraction, which is a time-consuming process. Both approaches suffer runtime issues, because they should check the new bug report to all bug reports in the repository, and it takes a long time for feature extraction and duplicate detection. This study proposes a new two-step classification approach which tries to reduce the search space of the bug repository search space in the first step and then check the duplicate detection using textual features. The Mozilla and Eclipse datasets are used for experimental evaluation. The results show that overall, ۸۷.۷۰% and ۸۹.۰۱% validation performance achieved averagely for accuracy and F۱-measure, respectively. Moreover, ۹۵.۸۵% and ۸۷.۶۵% of bug reports can be classified in step one very fast for Eclipse and Mozilla datasets, respectively, and the other one needs textual feature extraction until it can be checked by the traditional DBRD approach. An average of ۹۰% runtime improvement is achieved using the proposed method.

کلیدواژه ها

Duplicate Detection, Bug Report, Machine learning, Runtime Performance, Search Space Reduction

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.