Efficient GAN-based Method for Extractive Summarization

سال انتشار: 1401
محل انتشار: مجله نوآوری های مهندسی برق و کامپیوتر، دوره: 10، شماره: 2
کد COI اختصاصی: JR_JECEI-10-2_003
زبان مقاله: انگلیسی
تعداد مشاهده: 173

دانلود فایل این مقاله

نویسندگان

S.V. Moravvej

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran.

M.J. Maleki Kahaki

Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

M. Salimi Sartakhti

Department of Electrical and Computer Engineering, Amirkabir University of Technology, Tehran, Iran.

M. Joodaki

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran.

چکیده

kground and Objectives: Text summarization plays an essential role in reducing time and cost in many domains such as medicine, engineering, etc. On the other hand, manual summarization requires much time. So, we need an automated system for summarizing. How to select sentences is critical in summarizing. Summarization techniques that have been introduced in recent years are usually greedy in the choice of sentences, which leads to a decrease in the quality of the summary. In this paper, a non-greedily method for selecting essential sentences from a text is presented.Methods: The present paper presents a method based on a generative adversarial network and attention mechanism called GAN-AM for extractive summarization. Generative adversarial networks have two generator and discriminator networks whose parameters are independent of each other. First, the features of the sentences are extracted by two traditional and embedded methods. We extract ۱۲ traditional features. Some of these features are extracted from sentence words and others from the sentence. In addition, we use the well-known Skip-Gram model for embedding. Then, the features are entered into the generator as a condition, and the generator calculates the probability of each sentence in summary. A discriminator is used to check the generated summary of the generator and to strengthen its performance. We introduce a new loss function for discriminator training that includes generator output, real and fake summaries of each document. During training and testing, each document enters the generator with different noises. It allows the generator to see many combinations of sentences that are suitable for quality summaries.Results: We evaluate our results on CNN/Daily Mail and Medical datasets. Summaries produced by the generator show that our model performs better than other methods compared based on the ROUGE metric. We apply different sizes of noise to the generator to check the effect of noise on our model. The results indicate that the noise-free model has poor results.Conclusion: Unlike recent works, in our method, the generator selects sentences non-greedily. Experimental results show that the generator with noise can produce summaries that are related to the main subject.

کلیدواژه ها

Text summarization, non-greedily, Generative adversarial network, attention mechanism, extractive summarization

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.