Improving the retrieval of documents relevant to the user s query expansion approach based on spell correction in the information retrieval system

  • سال انتشار: 1396
  • محل انتشار: چهارمین کنگره بین المللی فن اوری،ارتباطات و دانش
  • کد COI اختصاصی: ICTCK04_051
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 528
دانلود فایل این مقاله

نویسندگان

Maryam Houtinezhad

Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran

Hamid Reza Ghaffary

Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran

چکیده

Due to the increasing daily of users documentation in the globalcommunications network, the management, and control of itsinformation in it have been challenged. How to extract usefulknowledge in an Internet heterogeneous environment is essential. Inthis paper, query expansion approach was proposed. User queries aredivided into special subsets according to ontology. Informationretrieval techniques can retrieve user information needs in a largeamount of data. Search engines are the first selection of users to findinformation. In this search, Web crawler plays a key role in searchengines. A web crawler is a script that routinely scans the web. In thispaper, a query expansion method will be presented using acombination of vector space modeling and language statistical modelto improve the Retrieve of related documents. In the first approach,according to the ontology, the concept vector of terms is extracted.After that, the conceptual similarity of the user s query and thedocuments is calculated. In the second approach, the probabilitysimilarity of queries that may be misspelling is estimated. Moreover,the correct term replaces it. Available documents the proposed methodby the crawler of the web is compiled from various Wikipedia pages.The results of the conceptual retrieved of documents show that 89%accuracy, 50% recall and average precision of 38% have improvedcompared to other methods.

کلیدواژه ها

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.