Comparison of Text Mining Tools, Techniques and Issues
- سال انتشار: 1402
- محل انتشار: اولین کنفرانس بین المللی مهندسی و فناوری اطلاعات
- کد COI اختصاصی: TETSC01_004
- زبان مقاله: انگلیسی
- تعداد مشاهده: 189
نویسندگان
Professor & Principal
Charan Singh University
چکیده
Now-a-days, online reviews in the e-commerce website are increasingly written by theconsumers of the product. More than ۸۰ percent of the data present in them is unstructured. These reviews have become an important source of information for the new customers to research about these products online. The curious customer research often leads to decision making towards purchasing the product based on online reviews. In contrast to structured data, unstructured data such as texts, speech, videos and pictures do not come with a data model that enables a computer to use them directly. Nowadays, computers can interpret the knowledge encoded in unstructured data using methods from text analytics, image recognition and speech recognition. Therefore, unstructured data are used increasingly in decision-making processes. But although decisions are commonly based on unstructured data, data quality assessment methods for unstructured data are lacking. While databases store only structured data, most of the data is unstructured like text documents, web pages, emails etc. Text mining is what is required if useful information needs to be extracted from tons of text. But where to begin, what are the popular tools, which techniques are used, what are the features. Beginning is always the toughest, so in this work tries to explore the tools available for text mining to help new researchers and practitioners in the field of text mining.کلیدواژه ها
Text Mining, Text Analytics, Text Mining Tools, Techniques for text mining, Data Analysis, quality of unstructured dataاطلاعات بیشتر در مورد COI
COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.
کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.