Can we trust LLMs to provide appropriate medical advice?

The quick response to this question may seem obvious.

Following the launch of ChatGPT Health and Claude for Healthcare in January, one of the first studies to assess whether these dedicated healthcare large language models (LLMs) can provide medical advice highlighted serious concerns. In research published in Nature Medicine, the authors assessed whether ChatGPT Health could triage medical cases written by physicians – they found that it correctly triaged classical emergencies, such as stroke and anaphylaxis, but failed to properly triage other emergencies.¹ It also over-triaged non-urgent cases. Overall, the study raised concerns associated with the potential consumer-scale deployment of AI triage systems.

What if we ‘think deeper’?

A JAMA editorial covered the triage study as part of a broader discussion on the use of AI tools to answer patients’ questions about their medical care.² According to a healthcare strategy lead at OpenAI, there are 800 million users of ChatGPT each week and, of these, 1 in 4 seek health-related information.² Despite this, we remain in the midst of an accessibility crisis where patients routinely encounter barriers when seeking trusted content to help them make informed decisions about their healthcare. For practical examples of how Amiculum is helping to improve patient engagement, see our blogs covering our recent activities at ISMPP meetings,such as using AI to translate complex medical information into public understanding³ and partnering with patients across age groups in research and publications.⁴

One message from the editorial is that LLMs hold promise for expanding access to medical expertise or, at the very least, preparing patients to make the best use of visits with their physicians. The editorial states that the rapid development of new generative AI applications for patient use may seem like science fiction but points out that ChatGPT is still only 3 years old – it was launched in November 2022.⁵ However, general themes of the article are that, currently, chatbots are perhaps more suited for ‘low-stakes support’, such as explaining medical terms or preparing questions for a clinician, and that uploading sensitive medical data to LLMs remains a concern.

How can we learn to trust healthcare LLMs?

It’s important to acknowledge that the capabilities of LLMs continue to increase. Since the publication of the triage study, OpenAI has already announced multiple updates to the models used by ChatGPT (and therefore ChatGPT Health).^6,7However, access to more powerful LLMs may only be part of the solution.

Another recent study (also discussed in the JAMA editorial) assessed whether general-purpose LLMs could identify underlying conditions from medical scenarios presented to them by physicians or patients, and then provide an appropriate course of action.⁸ Among the findings, the authors identified various issues with human–LLM interactions, including the provision of incomplete information by the user, misinterpretation of key details by the LLM, and users failing to identify relevant suggestions raised by the LLM. As users, we must understand how best to communicate with LLMs.

As discussed in a recent Amiculum insights podcast, we can help improve the accuracy of LLM responses by making reputable evidence sources more accessible to LLMs through generative engine optimization. Ensuring that our medical evidence is machine-readablemeans LLMs can more easily parse, extract and cite our content. Furthermore, although published scientific articles typically include metadata tagging for discoverability and search engine optimization, we found that, having contacted 10 major publishers, most do not currently tag articles specifically for LLM discoverability.

In summary, these studies highlight current concerns about the use of LLMs to provide medical advice. However, if we are to help patients make informed decisions about their healthcare, an obvious initial action is to improve accessibility to our trusted evidence sources.

References

Ramaswamy A et al. ChatGPT Health performance in a structured test of triage recommendations. Nat Med 2026. doi.org/10.1038/s41591-026-04297-7
Rubin R. Are AI tools ready to answer patients’ questions about their medical care? JAMA 2026. doi: 10.1001/jama.2026.1122
Amiculum. Is AI the answer? Translating complex medical information into public understanding. Available at: https://amiculum.biz/insights/is-ai-the-answer-translating-complex-medical-information-into-public-understanding/ (accessed March 2026)
Amiculum. From ‘why’ to ‘how’: patient partnership across age groups in research and publications. Available at: https://www.amiculum.biz/insights/from-why-to-how-patient-partnership-across-age-groups-in-research-and-publications/?page=1 (accessed March 2026)
OpenAI. Introducing ChatGPT. Available at: https://openai.com/index/chatgpt/ (accessed March 2026)
OpenAI. GPT‑5.3 Instant: Smoother, more useful everyday conversations. Available at: https://openai.com/index/gpt-5-3-instant/ (accessed March 2026)
OpenAI. Introducing GPT‑5.4. Available at: https://openai.com/index/introducing-gpt-5-4/ (accessed March 2026)
Bean AM et al. Reliability of LLMs as medical assistants for the general public: a randomized preregistered study. Nat Med 2026;32:609–15

About the author

Ben Clarke

Ben has 18 years' experience in medical communications and, as part of his current role as publications consultant at Amiculum, he is passionate about seeking to advance our profession by exploring innovative services and approaches, whilst ensuring we maintain our excellent standards and values. Ben’s previous roles in medical communications include production editor for an open-access, biomedical research journal and medical writer/scientific lead. Ben is CMPP certified and an active participant at industry events.

Back

Start a conversation

Find out how our experts can address your healthcare communication challenges.

Amiculum Blog Article