Practical Detection of Click Spams Using Efficient Classification-Based Algorithms

  • سال انتشار: 1397
  • محل انتشار: مجله بین المللی ارتباطات و فناوری اطلاعات، دوره: 10، شماره: 2
  • کد COI اختصاصی: JR_ITRC-10-2_006
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 300
دانلود فایل این مقاله

نویسندگان

Mahdieh Fallah

Department. of Computer Engineering Yazd University Yazd, Iran

Sajjad Zarifzadeh

Department of Computer Engineering Yazd University Yazd, Iran

چکیده

Most of today’s Internet services utilize user feedback (e.g. clicks) to improve the quality of their services. For example, search engines use click information as a key factor in document ranking. As a result, some websites cheat to get a higher rank by fraudulently absorbing clicks to their pages. This phenomenon, known as “Click Spam”, is initiated by programs called “Click Bot”. The problem of distinguishing bot-generated traffic from the user traffic is critical for the viability of Internet services, like search engines. In this paper, we propose a novel classification-based system to effectively identify fraudulent clicks in a practical manner. We first model user sessions with three different levels of features, i.e. session-based, user-based and IP-based features. Then, we classify sessions with two different methods: a one-class and a two-class classification that both work based on the well-known K-Nearest Neighbor algorithm. Finally, we analyze our methods with the real log of a Persian search engine. Experimental results show that the proposed algorithms can detect fraudulent clicks with a precision of up to 96% which outperform the previous works by more than 5%.

کلیدواژه ها

bot, click spam, user session modeling, classification

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.