E-Mail Spam Detection Based on Part of Speech Tagging
- سال انتشار: 1394
- محل انتشار: مجله مهندسی دانش بنیان و نوآوری، دوره: 1، شماره: 3
- کد COI اختصاصی: JR_JKBEI-1-3_006
- زبان مقاله: انگلیسی
- تعداد مشاهده: 798
نویسندگان
Department of Computer Engineering & IT, Shiraz university of technology, Shiraz, Iran
Department of Computer Engineering & IT, Shiraz university of technology, Shiraz, Iran
Department of Computer Engineering & IT, Shiraz university of technology, Shiraz, Iran
چکیده
Ever since the emails became well-known tools in communication field, the problem of spams was associated with them. One of the most significant methods for filtering such junk email is diagnostic of those e-mails by applying some especial technics named as Data-Mining. In the presented paper, a new approach based on this strategy that how frequently words are repeated is proposed in which the key words in the evidence are found by usage of their repetition number (frequency). The key sentences, those with the key words, of the incoming e-mails have to be tagged and thereafter the grammatical roles of the entire words in the sentence need to be determined, finally they will be put together in a vector in order to indicate the similarity between the received emails. The proposed paper takes advantage of an extraordinary algorithm called K-Mean algorithm to classify the received e-mails. It is worthwhile to note that the so-called K-Mean algorithm follows some simple and understandable rules which are too easy to work with and this stands as a great privilege for this paper. The precision of the applied algorithm in diagnostic of the e-mails is 83 percent.کلیدواژه ها
K-Mean algorithm, Spam e-mail, data mining, pos tagging, vector modelمقالات مرتبط جدید
- Green Bioprocessing Innovations In Protease Production: Sustainable Waste Utilization, Process Optimization, And Industrial Applications
- اصول، تکنیک ها و کاربردهای جداسازی و خالص سازی پروتئین: یک نگاه جامع
- Green-Synthesized ZnO Nanoparticles Using Calendula Officinalis Extract For Pva-Based Nanocomposite Burn Wound Dressings: Characterization And Mechanical Properties
- مروری جامع بر روش های پردازش سیگنال EMG در بیماران مبتلا به پارکینسون: روندها، چالش ها و جهت گیری های آینده
- A Machine Learning-Based Framework For Energy Optimization In Distributed Edge Computing Systems
اطلاعات بیشتر در مورد COI
COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.
کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.