Benefiting from Structured Resources to Present a Computationally Efficient Word Embedding Method

سال انتشار: 1401
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 203

فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JADM-10-4_005

تاریخ نمایه سازی: 28 آذر 1401

چکیده مقاله:

In recent years, new word embedding methods have clearly improved the accuracy of NLP tasks. A review of the progress of these methods shows that the complexity of these models and the number of their training parameters grows increasingly. Therefore, there is a need for methodological innovation for presenting new word embedding methodologies. Most current word embedding methods use a large corpus of unstructured data to train the semantic vectors of words. This paper addresses the basic idea of utilizing from structure of structured data to introduce embedding vectors. Therefore, the need for high processing power, large amount of processing memory, and long processing time will be met using structures and conceptual knowledge lies in them. For this purpose, a new embedding vector, Word۲Node is proposed. It uses a well-known structured resource, the WordNet, as a training corpus and hypothesis that graphic structure of the WordNet includes valuable linguistic knowledge that can be considered and not ignored to provide cost-effective and small sized embedding vectors. The Node۲Vec graph embedding method allows us to benefit from this powerful linguistic resource. Evaluation of this idea in two tasks of word similarity and text classification has shown that this method perform the same or better in comparison to the word embedding method embedded in it (Word۲Vec). This result is achieved while the required training data is reduced by about ۵۰,۰۰۰,۰۰۰%. These results provide a view of capacity of the structured data to improve the quality of existing embedding methods and the resulting vectors.

نویسندگان

F. Jafarinejad

Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, ...
  • T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. ...
  • J. Pennington, R. Socher, and C. Manning, “GloVe: Global Vectors ...
  • J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: ...
  • Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining ...
  • T. Brown et al., “Language Models are Few-Shot Learners,” arXiv:۲۰۰۵.۱۴۱۶۵, ...
  • Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, ...
  • R. A. Stein, P. A. Jaques, and J. F. Valiati, ...
  • Q. Chen and A. Crooks, “Analyzing the Vaccination debate in ...
  • A. Pimpalkar and J. R. Raj R, “MBiLSTMGloVe: Embedding GloVe ...
  • M. Molaei and D. Mohamadpur, “Distributed Online Pre-Processing Framework for ...
  • E. Manzini, J. Garrido-Aguirre, J. Fonollosa, and A. Perera-Lluna, “Mapping ...
  • A. Joshi, E. Fidalgo, E. Alegre, and L. Fernández-Robles, “DeepSumm: ...
  • T. Xian, Z. Li, C. Zhang, and H. Ma, “Dual ...
  • A. Shahini Shamsabadi, R. Ramezani, H. Khosravi Farsani, and M. ...
  • R. Navigli and S. P. Ponzetto, “BabelNet: Building a Very ...
  • S. Rothe and H. Schütze, “AutoExtend: Extending Word Embeddings to ...
  • A. T. Thibault Cordier, “Learning Word Representations by Embedding the ...
  • A. Kutuzov, M. Dorgham, O. Oliynyk, C. Biemann, and A. ...
  • J. Harvill, R. Girju, and M. Hasegawa-Johnson, “Syn۲Vec: Synset Colexification ...
  • A. Budanitsky and G. Hirst, “Evaluating WordNet-Based Measures of Lexical ...
  • Z. Zhao, X. Chen, D. Wang, Y. Xuan, and G. ...
  • X. Wu, Y. Zheng, T. Ma, H. Ye, and L. ...
  • Q. Tian et al., “Lower Order Information Preserved Network Embedding ...
  • A. Amara, M. A. Hadj Taieb, and M. Ben Aouicha, ...
  • L. Moyano, “Learning Network Representations,” Eur. Phys. J. Spec. Top., ...
  • G. Alanis-Lobato, P. Mier, and M. A. Andrade-Navarro, “Efficient Embedding ...
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, ...
  • A. Grover and J. Leskovec, “Node۲vec: Scalable Feature Learning for ...
  • D. Wang, P. Cui, and W. Zhu, “Structural Deep Network ...
  • Z. Zhang, P. Cui, and W. Zhu, “Deep Learning on ...
  • S. Cao, W. Lu, and Q. Xu, “GraRep: Learning Graph ...
  • J. Chen, Z. Gong, W. Wang, W. Liu, and X. ...
  • A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. ...
  • B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online Learning ...
  • T. Landauer, P. Foltz, and D. Laham, “An Introduction to ...
  • V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, ...
  • D. Soergel, WordNet. An Electronic Lexical Database. MIT Press, ۱۹۹۸ ...
  • A. Pal and D. Saha, “Word Sense Disambiguation: A Survey,” ...
  • A. Montejo-Ráez, E. Martínez-Cámara, M. T. Martín-Valdivia, and L. A. ...
  • O. El Midaoui, B. El Ghali, A. El Qadi, and ...
  • T. Hao, W. Xie, Q. Wu, H. Weng, and Y. ...
  • S. K. Ray, S. Singh, and B. P. Joshi, “A ...
  • J. Goikoetxea, A. Soroa, and E. Agirre, “Bilingual Embeddings with ...
  • D. Banik, A. Ekbal, P. Bhattacharyya, S. Bhattacharyya, and J. ...
  • L. Finkelstein et al., “Placing Search in Context: The Concept ...
  • R. Misra, “News Category Dataset.” ۲۰۱۸, doi: ۱۰.۱۳۱۴۰/RG.۲.۲.۲۰۳۳۱.۱۸۷۲۹ ...
  • A. L. Maas, R. E. Daly, P. T. Pham, D. ...
  • H. Rubenstein and J. Goodenough, “Contextual Correlates of Synonymy,” Commun. ...
  • G. Cassani and A. Lopopolo, “Multimodal Distributional Semantics Models and ...
  • G. A. Miller and W. G. Charles, “Contextual Correlates of ...
  • and A. K. Felix Hill, Roi Reichart, “SimLex-۹۹۹: Evaluating Semantic ...
  • I. Vulić et al., “Multi-SimLex: A Large-Scale Evaluation of Multilingual ...
  • نمایش کامل مراجع