Article Text

other Versions

Download PDFPDF
Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts
  1. Gabriel Levin1,
  2. Rene Pareja2,
  3. David Viveros-Carreño3,4,
  4. Emmanuel Sanchez Diaz5,
  5. Elise Mann Yates6,
  6. Behrouz Zand7 and
  7. Pedro T Ramirez8
    1. 1Division of Gynecologic Oncology, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
    2. 2Gynecologic Oncology, Clinica ASTORGA, Medellin, and Instituto Nacional de Cancerología, Bogotá, Colombia
    3. 3Gynecologic Oncology, Instituto Nacional de Cancerología, Bogota, Colombia
    4. 4Gynecologic Oncology, Clínica Universitaria Colombia And Clínica Los Nogales, Bogotá, Colombia
    5. 5Universidad Pontificia Bolivariana Clinica Universitaria Bolivariana, Medellin, Colombia
    6. 6Obstetrics and Gynecology, Houston Methodist Hospital, Houston, Texas, USA
    7. 7Gynecologic Oncology, Houston Methodist, Shenandoah, Texas, USA
    8. 8Department of Obstetrics and Gynecology, Houston Methodist Hospital, Houston, Texas, USA
    1. Correspondence to Gabriel Levin, Department of Gynecologic Oncology, McGill University, Montreal, Canada; Gabriel.levin2{at}mail.mcgill.ca

    Abstract

    Objective To determine if reviewer experience impacts the ability to discriminate between human-written and ChatGPT-written abstracts.

    Methods Thirty reviewers (10 seniors, 10 juniors, and 10 residents) were asked to differentiate between 10 ChatGPT-written and 10 human-written (fabricated) abstracts. For the study, 10 gynecologic oncology abstracts were fabricated by the authors. For each human-written abstract we generated a ChatGPT matching abstract by using the same title and the fabricated results of each of the human generated abstracts. A web-based questionnaire was used to gather demographic data and to record the reviewers’ evaluation of the 20 abstracts. Comparative statistics and multivariable regression were used to identify factors associated with a higher correct identification rate.

    Results The 30 reviewers discriminated 20 abstracts, giving a total of 600 abstract evaluations. The reviewers were able to correctly identify 300/600 (50%) of the abstracts: 139/300 (46.3%) of the ChatGPT-generated abstracts and 161/300 (53.7%) of the human-written abstracts (p=0.07). Human-written abstracts had a higher rate of correct identification (median (IQR) 56.7% (49.2–64.1%) vs 45.0% (43.2–48.3%), p=0.023). Senior reviewers had a higher correct identification rate (60%) than junior reviewers and residents (45% each; p=0.043 and p=0.002, respectively). In a linear regression model including the experience level of the reviewers, familiarity with artificial intelligence (AI) and the country in which the majority of medical training was achieved (English speaking vs non-English speaking), the experience of the reviewer (β=10.2 (95% CI 1.8 to 18.7)) and familiarity with AI (β=7.78 (95% CI 0.6 to 15.0)) were independently associated with the correct identification rate (p=0.019 and p=0.035, respectively). In a correlation analysis the number of publications by the reviewer was positively correlated with the correct identification rate (r28)=0.61, p<0.001.

    Conclusion A total of 46.3% of abstracts written by ChatGPT were detected by reviewers. The correct identification rate increased with reviewer and publication experience.

    • Gynecologic Surgical Procedures

    Data availability statement

    Data are available upon reasonable request.

    Statistics from Altmetric.com

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

    Data availability statement

    Data are available upon reasonable request.

    View Full Text

    Footnotes

    • X @RParejaGineOnco, @ZandBehrouz, @pedroramirezMD

    • Contributors GL: guarantor, conceptualization, data curation, methodology, investigation, formal analysis, writing - original draft, writing - review, and editing. conceptualization, project administration. PTR: conceptualization, data curation, methodology, investigation, writing - original draft, writing - review, project administration. BZ: methodology, conceptualization, writing - original draft, writing - review and editing. RP: writing - original draft, writing - review and editing. ESD: methodology, writing - review and editing. DV-C: writing - review and editing. EMY: project administration.

    • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

    • Competing interests None declared.

    • Provenance and peer review Not commissioned; externally peer reviewed.

    • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.