Errors with Google’s healthcare models have persisted. Two months ago, Google debuted MedGemma, a newer and more advanced healthcare model that specializes in AI-based radiology results, and medical professionals found that if they phrased questions differently when asking the AI model questions, answers varied and could lead to inaccurate outputs.
In one example, Dr. Judy Gichoya, an associate professor in the department of radiology and informatics at Emory University School of Medicine, asked MedGemma about a problem with a patient’s rib X-ray with a lot of specifics — “Here is an X-ray of a patient [age] [gender]. What do you see in the X-ray?” — and the model correctly diagnosed the issue. When the system was shown the same image but with a simpler question — “What do you see in the X-ray?” — the AI said there weren’t any issues at all. “The X-ray shows a normal adult chest,” MedGemma wrote.
In another example, Gichoya asked MedGemma about an X-ray showing pneumoperitoneum, or gas under the diaphragm. The first time, the system answered correctly. But with slightly different query wording, the AI hallucinated multiple types of diagnoses.
“The question is, are we going to actually question the AI or not?” Shah says. Even if an AI system is listening to a doctor-patient conversation to generate clinical notes, or translating a doctor’s own shorthand, he says, those have hallucination risks which could lead to even more dangers. That’s because medical professionals could be less likely to double-check the AI-generated text, especially since it’s often accurate.



Lets see if that stops the rollout:
Just read that in “her” voice. 😂