TY - JOUR
T1 - Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content
T2 - The Stroke Journal Essay Contest
AU - Silva, Gisele S.
AU - Khera, Rohan
AU - Schwamm, Lee H.
AU - Acampa, Maurizio
AU - Adelman, Eric E.
AU - Boltze, Johannes
AU - Broderick, Joseph P.
AU - Brodtmann, Amy
AU - Christensen, Hanne
AU - Dalli, Lachlan
AU - Duncan, Kelsey Rose
AU - Elgendy, Islam Y.
AU - Ergul, Adviye
AU - Goldstein, Larry B.
AU - Hinkle, Janice L.
AU - Johansen, Michelle C.
AU - Jood, Katarina
AU - Kasner, Scott E.
AU - Levine, Steven R.
AU - Li, Zixiao
AU - Lip, Gregory
AU - Marsh, Elisabeth B.
AU - Muir, Keith W.
AU - Ospel, Johanna Maria
AU - Pera, Joanna
AU - Quinn, Terence J.
AU - Räty, Silja
AU - Ranta, Anna
AU - Richards, Lorie Gage
AU - Romero, Jose Rafael
AU - Willey, Joshua Z.
AU - Hillis, Argye E.
AU - Veerbeek, Janne M.
N1 - Publisher Copyright:
© 2024 American Heart Association, Inc.
PY - 2024/10/1
Y1 - 2024/10/1
N2 - Artificial intelligence (AI) large language models (LLMs) now produce human-like general text and images. LLMs' ability to generate persuasive scientific essays that undergo evaluation under traditional peer review has not been systematically studied. To measure perceptions of quality and the nature of authorship, we conducted a competitive essay contest in 2024 with both human and AI participants. Human authors and 4 distinct LLMs generated essays on controversial topics in stroke care and outcomes research. A panel of Stroke Editorial Board members (mostly vascular neurologists), blinded to author identity and with varying levels of AI expertise, rated the essays for quality, persuasiveness, best in topic, and author type. Among 34 submissions (22 human and 12 LLM) scored by 38 reviewers, human and AI essays received mostly similar ratings, though AI essays were rated higher for composition quality. Author type was accurately identified only 50% of the time, with prior LLM experience associated with improved accuracy. In multivariable analyses adjusted for author attributes and essay quality, only persuasiveness was independently associated with odds of a reviewer assigning AI as author type (adjusted odds ratio, 1.53 [95% CI, 1.09-2.16]; P=0.01). In conclusion, a group of experienced editorial board members struggled to distinguish human versus AI authorship, with a bias against best in topic for essays judged to be AI generated. Scientific journals may benefit from educating reviewers on the types and uses of AI in scientific writing and developing thoughtful policies on the appropriate use of AI in authoring manuscripts.
AB - Artificial intelligence (AI) large language models (LLMs) now produce human-like general text and images. LLMs' ability to generate persuasive scientific essays that undergo evaluation under traditional peer review has not been systematically studied. To measure perceptions of quality and the nature of authorship, we conducted a competitive essay contest in 2024 with both human and AI participants. Human authors and 4 distinct LLMs generated essays on controversial topics in stroke care and outcomes research. A panel of Stroke Editorial Board members (mostly vascular neurologists), blinded to author identity and with varying levels of AI expertise, rated the essays for quality, persuasiveness, best in topic, and author type. Among 34 submissions (22 human and 12 LLM) scored by 38 reviewers, human and AI essays received mostly similar ratings, though AI essays were rated higher for composition quality. Author type was accurately identified only 50% of the time, with prior LLM experience associated with improved accuracy. In multivariable analyses adjusted for author attributes and essay quality, only persuasiveness was independently associated with odds of a reviewer assigning AI as author type (adjusted odds ratio, 1.53 [95% CI, 1.09-2.16]; P=0.01). In conclusion, a group of experienced editorial board members struggled to distinguish human versus AI authorship, with a bias against best in topic for essays judged to be AI generated. Scientific journals may benefit from educating reviewers on the types and uses of AI in scientific writing and developing thoughtful policies on the appropriate use of AI in authoring manuscripts.
KW - artificial intelligence
KW - neurologists
KW - peer review
KW - stroke
KW - writing
UR - http://www.scopus.com/inward/record.url?scp=85203202409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203202409&partnerID=8YFLogxK
U2 - 10.1161/STROKEAHA.124.045012
DO - 10.1161/STROKEAHA.124.045012
M3 - Review article
C2 - 39224979
AN - SCOPUS:85203202409
SN - 0039-2499
VL - 55
SP - 2573
EP - 2578
JO - Stroke
JF - Stroke
IS - 10
ER -