AI-generated ophthalmic references and abstracts: Proceed with caution

News
Article

Clinicians should be alert to the fact that while artificial intelligence (AI) is capable of generating ideas and references, it is crucial to thoroughly vet and fact-check any medical research content that AI produces.

©Supatman / Adobe.stock.com

While artificial intelligence (AI) is capable of generating ideas and references, clinicians should thoroughly vet and fact-check any medical research content that AI produces. (Image Credit: ©Supatman - Adobe.Stock.com)

Hong-Uyen Hua, MD, a recently graduated surgical retina fellow and first study author, reported that clinicians should be alert to the fact that while artificial intelligence (AI) is capable of generating ideas and references, they need to go a step further and thoroughly vet and fact-check any medical research content that AI produces.1 Hua, senior author Danny Mammo, MD, and colleagues are from the Cole Eye Institute, Cleveland Clinic Foundation, Cleveland.

Hua and colleagues pointed out the rapid growth in the popularity of AI chatbots and the potential for significant implications for patient education and academia. They also noted that the disadvantages of using these chatbots for generating abstracts and references have not been investigated thoroughly.

To remedy this, the research team conducted a cross-sectional comparative study to do just that, ie, evaluate and compare the quality of ophthalmic scientific abstracts and references generated by earlier and updated versions of a popular AI chatbot.

The study used 2 versions of an AI chatbot to generate scientific abstracts and 10 references for clinical research questions across 7 ophthalmology subspecialties. Two of the authors graded the abstracts using modified DISCERN criteria and performance evaluation scores, and 2 AI output detectors also evaluated the abstracts. A so-called hallucination rate for references generated by the earlier and updated versions of the chatbot but which could not be verified was calculated and compared.

Results of the comparison

The investigators found that the “mean modified AI-DISCERN scores for the chatbot-generated abstracts were 35.9 and 38.1 out of a maximal score of 50 for the earlier and updated versions, respectively (P = 0.30). Based on the 2 AI output detectors, the mean fake scores, with a score of 100% meaning generated by AI, for the earlier and updated chatbot-generated abstracts were 65.4% and 10.8%, respectively (P = 0.01) for 1 detector and 69.5% and 42.7% (P = 0.17) for the second detector. The mean hallucination rates for nonverifiable references generated by the earlier and updated versions were 33% and 29% (P = 0.74).”

The results mean that the quality between the abstracts generated by the versions of the chatbot was comparable. The mean hallucination rate of the citations was about 30% and was comparable between the versions.

Considering that the version of the chatbot produced abstracts of average quality and hallucinated citations that seemed to be realistic, Hua and colleagues warned clinicians to be aware of the potential for factual errors or hallucinations. Any medical content produced by AI should be carefully vetted and fact-checked before it is used for health education or academic purposes.

Hua commented, “The idea for this study initially came while I was exploring generative AI chatbots and their possible applications in ophthalmology. I quickly realized that the chatbot was making up references—a term called ‘hallucinations’ in generative AI. On top of that, the chatbot was unable to distinguish nuances in the scientific literature (e.g. oral vs intravenous dosing of steroids in optic neuritis). Current AI detectors perform poorly in detecting AI-generated text, especially with the newer version of AI chatbots. The scientific community at large must be wary of the implications of using generative AI for research purposes.”

Reference
  1. Hua H-U, Kaakour A-H, Rachitskaya A, et al. Evaluation and comparison of ophthalmic scientific abstracts and references by current artificial intelligence chatbots. JAMA Ophthalmol. 2023; doi: 10.1001/jamaophthalmol.2023.3119. Online ahead of print.

Hong-Uyen Hua, MD

E: honguyenhua@gmail.com

Hua recently completed vitreoretinal surgery fellowship at the Cole Eye Institute, Cleveland Clinic Foundation, Cleveland. She has no financial interest in this subject matter.

Related Videos
TENAYA, LUCERNE year 2 data reveals promising results for faricimab
How to diagnose geographic atrophy earlier
World Sight Day 2022: Eye care professionals share what global vision means to them
Samsara Vision update: Concerto trial recruiting patients with late-stage AMD
Understanding fluid dynamics in wet macular degeneration
YOSEMITE, RHINE treat-and-extend data show favorable results for faricimab for the treatment of DME
What are you most excited about in the field of retina? Tunde Peto, MD, PhD weighs in
Leveraging noninvasive ophthalmic imaging for patients with Alzheimer disease and analyzing UK Biobank data
EURETINA 2022: Leadership discusses what to expect, outlines Women in Retina program
DAVIO trial update: 12-month safety results indicate no serious adverse events, reduced treatment burden
© 2024 MJH Life Sciences

All rights reserved.