How to choose and implement an LLM for your healthcare product

You have a healthcare product with users and momentum. Your CEO just asked about adding AI. Before you start shopping for an LLM, there are some critical considerations that go beyond model performance.

Working in highly regulated industries means AI implementation carries risks that other sectors don’t face. A hallucinated answer about medication dosages or leaked patient data can have serious consequences. Staying agile in regulated industries requires careful planning around security, cost, and risk management.

LLMs in healthcare: start with your use case

Different LLMs perform differently depending on how you plan to use them. A chatbot answering general questions about supplements has different requirements than a tool making clinical determinations for healthcare providers.

Understanding your specific use case shapes everything else. Are you building an end-user chat feature? Generating internal summaries for health professionals? Each scenario has different performance needs, security requirements, and cost implications. It’s important to clarify what you’ll be doing with an LLM before comparing models.

AI, HIPAA compliance and data protection

This isn’t an industry where you can turn to free LLMs. Free models train on your input, which means you’re contributing sensitive health information to their datasets. This violates patient privacy and regulatory requirements.

Paid LLMs typically promise not to train on your data, but you still need to review their terms of service carefully. Look for clear commitments around data protection and ownership. Pay attention to compliance with data regulations like HIPAA (US), CCPA (California), and GDPR (EU).

Beyond the LLM provider, consider your entire data pipeline. When users type into your chat interface, their data travels through your server, gets sent to the LLM, returns through your system, and gets displayed to the user. Each touchpoint is a potential exposure risk. Think through where you’re storing conversation logs, what gets passed through APIs, and whether you’re inadvertently creating a large repository of patient information in your database.

Protecting user data in HIPAA compliant environments extends to how you handle AI interactions. Tools like our Top Secret gem can help by scrubbing or anonymizing data before it reaches the AI, reducing the amount of sensitive information stored in your systems.

The economics of tokens

LLMs charge based on tokens consumed in conversations, so a process that seems affordable for a few beta users can become expensive quickly when you scale to thousands of customers. Lengthy conversations with lots of data passed back and forth add up fast.

Unfortunately, you can’t easily model costs upfront because implementation needs to be iterative. The key is awareness and monitoring: build cost tracking into your testing process. As you near release, run empirical cost studies to understand what actual usage will look like. This helps you make informed decisions about which model to use and how to architect your implementation for cost efficiency.

Often, it’s a matter of trade-offs. You might sacrifice speed or the amount of information you provide to users in order to control costs.

Managing LLM hallucinations in healthcare

There’s no way around it: LLMs give false answers presented as truth. It’s not a matter of if, but when. In healthcare, this is particularly dangerous when discussing symptoms, medications, or treatment recommendations.

We worked with one of our clients, FrontrowMD, to build an AI-powered chatbot that helps end users learn about different supplements. Our process included implementing a RAG system (Retrieval-Augmented Generation). We fed the AI with product and ingredient assessment data, clinician reviews, and internet searches against trusted health sources to enrich the responses. To further safeguard against hallucination, we built guard rails around what the AI could discuss and implemented mechanisms to catch problematic responses.

Another detection strategy? Task a second model with critiquing the first one’s answers. This can help with large volumes of content that might overwhelm human reviewers. Setting clear expectations for your users is a big help, too. This can be as simple as placing a warning on your UI letting people know they’re talking to an AI that might make mistakes.

Avoiding pitfalls and making a choice

The biggest mistake you can make is expecting too much too early from AI. This happens with any new technology, but it’s particularly risky when you’re implementing AI in healthcare. We recommend treating the LLM as a tool and assistant that requires human expertise and judgment to use correctly. Consider researching AI governance frameworks from organizations like the National Academy of Medicine, AMA andothers to find one adaptable to your needs. And remeber that AI won’t magically solve every problem or always give correct answers.

Ultimately, choosing an LLM for healthcare is about balancing security, economics, performance, and risk in a highly regulated environment.

Need help navigating AI implementation in healthcare? Our team has experience building secure, compliant applications for healthcare clients. Get in touch to discuss your project.

**

About thoughtbot

We've been helping engineering teams deliver exceptional products for over 20 years. Our designers, developers, and product managers work closely with teams to solve your toughest software challenges through collaborative design and development. Learn more about us.