From Algorithms to Academia: An Endeavor to Benchmark AI-Generated Scientific Papers against Human Standards

سال انتشار: 1404
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 14

فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_TABO-13-4_005

تاریخ نمایه سازی: 24 فروردین 1404

چکیده مقاله:

Objectives: The aim of this study is to quantitatively investigate the accuracy of text generated by AI large language models while comparing their readability and likelihood of being accepted to a scientific compared to human-authored papers on the same topics.Methods: The study consisted of two papers written by ChatGPT, two papers written by Assistant by scite, and two papers written by humans. A total of six independent reviewers were blinded to the authorship of each paper and assigned a grade to each subsection on a scale of ۱ to ۴. Additionally, each reviewer was asked to guess if the paper was written by a human or AI and explain their reasoning. The study authors also graded each AI-generated paper based on factual accuracy of the claims and citations.Results: The human-written calcaneus fracture paper received the highest score of a ۳.۷۰/۴, followed by Assistantwritten calcaneus fracture paper (۳.۰۲/۴), human-written ankle osteoarthritis paper (۲.۹۸/۴), ChatGPT calcaneus fracture (۲.۸۹/۴), ChatGPT Ankle Osteoarthritis (۲.۸۷/۴), and Assistant Ankle Osteoarthritis (۲.۷۸/۴). The human calcaneus fracture paper received a statistically significant higher rating than the ChatGPT calcaneus fracture paper (P = ۰.۰۲۸) and the Assistant calcaneus fracture paper (P = ۰.۰۴۳). The ChatGPT osteoarthritis review showed ۱۰۰% factual accuracy, the ChatGPT calcaneus fracture review was ۹۷.۴۶% factually accurate, the Assistant calcaneus fracture was ۹۵.۵۶% accurate, and the Assistant ankle osteoarthritis was ۹۴.۹۸% accurate. Regarding citations, the ChatGPT ankle osteoarthritis paper was ۹۰% accurate, the ChatGPT calcaneus fracture was ۶۹.۲۳% accurate, the Assistant ankle osteoarthritis was ۳۵.۱۴% accurate, and the Assistant calcaneus fracture was ۳۹.۶۸% accurate. Conclusion: Through this paper we emphasize that while AI holds the promise of enhancing knowledge sharing, it must be used responsibly and in conjunction with comprehensive fact-checking procedures to maintain the integrity of the scientific discourse. Level of evidence: III

نویسندگان

Jackson Woodrow

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

Nour Nassour

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

John Kwon

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

Soheil Ashkani-Esfahani

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

Mitchel Harris

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • Zinkula J, Mok A. ChatGPT may be coming for our ...
  • Hu K. ChatGPT sets record for fastest-growing user base - ...
  • Edwards, B. OpenAI’s GPT-۴ exhibits “human-level performance” on professional benchmarks. ARS ...
  • Ramazanian T, Fu S, Sohn S, Taunton MJ, Kremers HM. ...
  • Abedi R, Fatouraee N, Bostanshirin M, Arjmand N, Ghandhari H. Prediction ...
  • Dehouche N. Plagiarism in the age of massive Generative Pretrained ...
  • Gao CA, Howard FM, Markov NS, et al. Comparing scientific ...
  • Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: Implications in Scientific ...
  • Athaluri SA, Manthena SV, Kesapragada VKM, Yarlagadda V, Dave T, Duddumpudi ...
  • Pequeño A. Major ChatGPT update: AI program no longer restricted to ...
  • Salameh M, Al-Hashki L, Al-Juboori S, Rayyan R, Hantouly A, Blankenhorn ...
  • Herrera-Pérez M, Valderrabano V, Godoy-Santos AL, de César Netto C, González-Martín ...
  • Ramponi, M. How ChatGPT actually works. AssemblyAI. Available at: https://www.assemblyai.com/blog/how-chatgpt-actuallyworks/. ۲۰۲۲ ...
  • Kacena MA, Plotkin LI, Fehrenbacher JC. The Use of Artificial Intelligence ...
  • Kitamura FC. ChatGPT Is Shaping the Future of Medical Writing ...
  • Salvagno M, Taccone FS, Gerli AG. Can artificial intelligence help ...
  • Lee JY. The Use of Artificial Intelligence in Writing Scientific ...
  • نمایش کامل مراجع