One of my family members is an anesthesiologist who runs oral and written board exams for physicians seeking to become or remain board certified. Over the holidays I sat him down with ChatGPT and ran through a dozen of the hardest board questions, including the opportunity to ask clarifying follow ups. He said there were a couple responses that would have probably indicated something fishy was up due to ChatGPT’s repetitive use of certain language but that overall it easily passed.
I don't think an exam suitable to asses a human is also suitable to asses a computer program, at least not any more. The only way a human can pass an exam like that is to have studies countless hours and to have the required experience. The exam is not the same as the required expertise, but it is a reliable enough proxy.<p>Also for humans there have been bullshits artists who could pass it. The scene from Catch me if you can is based on a true story. But for computers bullshitting is an artform. Passing the exam is no longer a good proxy for the required expertise. Another form of examination is now necessary for silicon applicants.
Since knowledge ≠ competence I’m not sure what exactly we can make of this. Medical training involves a heavy layer of knowledge in interrelated disciplines but there’s a large procedural layer, that entities without opposable thumbs, let alone hands, cannot perform. Hopefully by the time that aspect of medical practice automation arrives we will have developed more standards around what level of safety is required before freeing it in the real world. As opposed to, say, full self-driving.
We need to stop using the word AI in contexts like this. It’s a LLM with a specialized layer trained in medical questions and answers.<p>Saying AI drops beneficial information in favor of simplicity, but we need that information in order to figure out what the data means.