The hn title is really pushing way beyond what the study can shoulder. The title from the result chapter sums it up better; "ChatGPT yields moderate accuracy approaching passing performance on USMLE"<p>The study was done on public available questions that have been used in usmle exams. They used two physician reviewers to judge ChatGPTs textual answers. While really interesting no exam was actually passed, and the authors never claim that.
The HN title is misleading. The study is titled:<p><i>Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models</i><p>The study did not attempt to get ChatGPT to pass any exam. Rather, ChatGPT was question on material in past exams. The authors are enthusiastic about ChatGPT's performance but nowhere do they claim it "passed the US Medical Licensing Exam" as the title claims.<p>Indeed, the study concludes that ChatGPT's performance is less than enough to pass the exam:<p><i>ChatGPT yields moderate accuracy approaching passing performance on USMLE</i> (line 287 of the manuscript).<p>@zekrioca, please correct the title so that it more closely reflects the results of the linked study.
Medical student here. I’m impressed by ChatGPT’s ability to come close to passing Step 2/3. I tried plugging in my own question banks that I used for studying and it got many right. Even when it was wrong it was also “almost right” and its answer would not be disparaged on rounds.<p>However one thing to keep in mind: a bot passing these exams alone will not disrupt medicine. There are plenty of doctors today that could ace these exams. I don’t think that the ability to outsource medical decision making to a bot will change much, especially since 90% of medical decision making is already outsourced to a flowchart or an “expert attending” physician. Much of the value of doctors today comes from procedures, patient interaction, legal liability, and patient trust, which cannot yet be done by a bot.
Just because it can pass the test doesn't mean that its behavior will be consistently correct. Trusting ChatGPT in medical applications is the same as trusting it in programming applications: It will work for simple to intermediate tasks and may even look as though its explaining itself properly, but for specialized tasks, sufficiently complicated ones, and even in simple tasks, it can't be expected to product correct results consistently.
,, The most recent iteration of the GPT LLM (GPT3) achieved 46% accuracy with zero prompting12, which marginally improved to 50% with further model training. Previous models, merely months prior, performed at 36.7%13. In this present study, ChatGPT performed at >50% accuracy across all examinations, exceeding 60% in most analyses''<p>I have heared many times that ChatGPT is basically GPT3, but somehow it feels like it gives more correct answers. Now there's data to show it.
I'm not sure if this is more of a pro in the ChatGPT column or an indictment of the USMLE. Although I don't think the USMLE is known as an easy exam, so I guess ChatGPT really is that good now?