Evaluating preprocessing by Turing Machine in text categorization

سال انتشار: 1392
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 1,007

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICS12_203

تاریخ نمایه سازی: 11 مرداد 1393

چکیده مقاله:

By developing the World Wide Web, text categorization becomes a key technology to deal with and organize a large number of documents. Automatic text categorization is amethod to contrast a massive data. The basic phases of text categorization include preprocessing, extracting relevant featuresagainst the features in a database, and finally categorizing a set of documents into predefined categories. In this article, we proposea new preprocessing method by Turing Machine. All of four steps in preprocessing such as sentence segmentation, tokenization, stopword removal, and word stemming are done by Turing Machine.Aiming to access the importance of the preprocessing by Turing Machine on the text classification problem, we applied the supportvector machine paradigm to the Reuters and PAGOD dataset. Searching for the best document representation, we evaluated and analyzed some known feature reduction, feature subset selection and term weighting. Experiments show that proposed method is more accurate than other methods

نویسندگان

Razieh Abbasi Ghalehtaki

Department of Computer Engineering,Hamedan Branch, Islamic Azad University, Science And Research Campus, Hamedan, Iran

Hassan Khotanlou

Department of Computer Engineering,Bu-Ali Sina University, Hamedan, Iran

Mansour Esmaeilpour

Department of Computer Engineering,Hamedan Branch, Islamic Azad University, Hamedan, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • H. Han, G. Karypis and V. Kumar, Text Categorization Using ...
  • _ _ for Machine Learning. Morgan ...
  • _ _ _ _ (editors), Intelligent Technologies for Information Analysis, ...
  • K. Nigam, J. Lafferty and A. McCallum, Using Maximum Entropy ...
  • M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, A ...
  • P. A. Flach, E. Gyftodimos and N. Lachiche, Probabilistic Reasoning ...
  • R. S. Michalski, I. Bratko, and M. Kubat, Machine Learning ...
  • _ _ _ _ _ International Journal _ Very Large ...
  • S. H. Wang, Cluster Analysis using a Validated Self Organizing ...
  • S. Haykin, Neural Networks. A Comprehensive Foundation. 2nd Edition, Prentice ...
  • _ _ _ _ Review, Volume 24, Numbers 3-4, pp. ...
  • V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, ...
  • http ://en _ wikipedi _ org/w iki /Turi ng_machine _ ...
  • Porter, M, An algorithm for suffix stripping, Program, _ 1980. ...
  • Cortes and Vapnik, Support-vector networks. Machine Learning, 20(3), ...
  • نمایش کامل مراجع