Confidence Measure Estimation for Open Information Extraction

  • سال انتشار: 1397
  • محل انتشار: فصلنامه سیستم های اطلاعاتی و مخابرات، دوره: 6، شماره: 1
  • کد COI اختصاصی: JR_JIST-6-1_001
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 392
دانلود فایل این مقاله

نویسندگان

Vahideh Reshadat

Malek-Ashtar University of Technology, Tehran, Iran

Maryam Hourali

Malek-Ashtar University of Technology, Tehran, Iran

Heshaam Faili

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

چکیده

The prior relation extraction approaches were relation-specific and supervised, yielding new instances of relations known a priori. While effective, this model is not applicable in case when the number of relations is high or where the relations are not known a priori. Open Information Extraction (OIE) is a relation-independent extraction paradigm designed to extract relations directly from massive and heterogeneous corpora such as Web. One of the main challenges for an Open IE system is estimating the probability that its extracted relation is correct. A confidence measure shows that how an extracted relation is a correct instance of a relation among entities. This paper proposes a new method of confidence estimation for OIE called Relation Confidence Estimator for Open Information Extraction (RCE-OIE). It investigates the incorporation of some proposed features in assigning confidence metric using logistic regression. These features consider diverse lexical, syntactic and semantic knowledge and also some extraction properties such as number of distinct documents from which extractions are drawn, number of relation arguments and their types. We implemented proposed confidence measure on the Open IE systems’ extractions and examined how it affects the performance of results. Evaluations show that incorporation of designed features is promising and the accuracy of our method is higher than the base methods while keeping almost the same performance as them. We also demonstrate how semantic information such as coherence measures can be used in feature-based confidence estimation of Open Relation Extraction (ORE) to further improve the performance.

کلیدواژه ها

Information Extraction; Open Information Extraction; Relation Extraction; Knowledge Discovery; Fact Extraction

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.