Building HIPAA-Compliant AI Features: What the Tutorials Skip

When I joined Allia Health to build their AI-assisted clinical platform, I had built plenty of LLM features before. What I hadn't built was an LLM feature that needed to handle Protected Health Information (PHI) under HIPAA compliance.

The difference is significant. This isn't a post about security theatre — it's about the specific technical and operational requirements that HIPAA actually imposes, and how I navigated them while still shipping useful AI features.

⚠️

This is engineering experience, not legal advice. Your compliance posture should be reviewed by a qualified healthcare compliance attorney and a HIPAA compliance officer before you put PHI anywhere near an AI system.

The First Question: Can You Send PHI to This API at All?

Before writing a single line of code, you need a Business Associate Agreement (BAA) with every vendor that will process PHI. A BAA is a legal contract in which the vendor agrees to handle PHI according to HIPAA requirements and accepts liability for breaches on their end.

For AI APIs specifically:

OpenAI: BAA available under their Enterprise tier
Anthropic: BAA available — contact their enterprise team
AWS (Bedrock, Comprehend Medical): BAA covered under the standard AWS BAA
Google Cloud (Vertex AI): BAA available under their healthcare offering

If a vendor doesn't offer a BAA, you cannot send them PHI. Full stop. This rules out a lot of consumer-tier AI products and many third-party AI wrappers.

At Allia Health we used Anthropic with a BAA in place, deployed behind AWS infrastructure which was also covered. Every vendor in the pipeline had signed agreements before we sent a single session transcript.

De-identify Before You Can, Send Minimum Necessary When You Must

Even with a BAA, you should apply the minimum necessary principle: only send the PHI required to accomplish the specific task.

For clinical note generation from session transcripts, the transcript necessarily contains PHI — you can't strip it and still generate a useful note. But for tasks like "classify this session type" or "suggest billing codes", you often can de-identify first:

import re
 
# Basic de-identification for low-stakes classification tasks
# For production, use a proper NER model or AWS Comprehend Medical
def deidentify_for_classification(text: str) -> str:
    # Remove obvious identifiers for classification-only tasks
    # Note: this is NOT sufficient for legal de-identification under HIPAA Safe Harbor
    patterns = [
        (r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]'),      # Phone numbers
        (r'\b\d{1,2}/\d{1,2}/\d{2,4}\b', '[DATE]'),          # Dates
        (r'\b[A-Z][a-z]+ [A-Z][a-z]+\b', '[NAME]'),          # Simple name pattern
        (r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]'),                 # SSNs
    ]
    result = text
    for pattern, replacement in patterns:
        result = re.sub(pattern, replacement, result)
    return result

For rigorous de-identification that meets HIPAA Safe Harbor requirements (18 specific identifier types), use AWS Comprehend Medical or a dedicated PHI detection service. Regex is not enough.

Audit Logging is Not Optional

HIPAA requires an audit trail of who accessed PHI, when, and what they did with it. For AI features this means logging every LLM call that involved PHI.

The minimum you need to log:

import structlog
from datetime import UTC, datetime
 
logger = structlog.get_logger()
 
async def generate_clinical_note(
    transcript: str,
    patient_id: str,
    clinician_id: str,
    session_id: str,
) -> ClinicalNote:
    # Log the access before making the call
    logger.info(
        "phi_access",
        event_type="llm_call",
        purpose="clinical_note_generation",
        patient_id=patient_id,           # Who's PHI
        clinician_id=clinician_id,        # Who accessed it
        session_id=session_id,
        timestamp=datetime.now(UTC).isoformat(),
        phi_categories=["session_transcript"],
        model="claude-3-5-sonnet-20241022",
        # Do NOT log the actual transcript content here
    )
 
    try:
        note = await _call_llm(transcript)
 
        logger.info(
            "phi_access_complete",
            event_type="llm_call_success",
            session_id=session_id,
            output_tokens=note.usage.output_tokens,
        )
        return note
 
    except Exception as e:
        logger.error(
            "phi_access_error",
            event_type="llm_call_failure",
            session_id=session_id,
            error_type=type(e).__name__,
        )
        raise

Critically: do not log PHI content itself. Log metadata about the access. Your audit logs need to be queryable (for breach investigations) but shouldn't themselves become a PHI liability.

Ship these logs to an immutable store — CloudWatch with log retention policies, or a dedicated SIEM. HIPAA requires you to retain audit logs for 6 years.

Don't Store LLM Inputs and Outputs by Default

AI APIs don't store your inputs by default when you have a BAA, but your own application might. Review everywhere you log request/response data.

A pattern I use: a context manager that temporarily suppresses sensitive logging:

from contextlib import contextmanager
 
@contextmanager
def phi_context(session_id: str):
    """
    Context manager that marks the current execution context as handling PHI.
    Middleware and logging handlers check this to suppress content logging.
    """
    token = phi_context_var.set({"active": True, "session_id": session_id})
    try:
        yield
    finally:
        phi_context_var.reset(token)
 
# Usage
async def handle_session(session: TherapySession):
    with phi_context(session.id):
        # Any logging inside here will be filtered by the middleware
        transcript = await transcribe(session.audio)
        note = await generate_clinical_note(transcript)

Encryption: In Transit and At Rest

For PHI handled by AI features specifically:

In transit: TLS 1.2 minimum, TLS 1.3 where supported. All LLM API calls go over HTTPS — verify your HTTP client isn't doing anything weird with SSL verification.
At rest: Any intermediate storage of PHI (transcripts, generated notes before they're saved to the EHR) needs encryption. If you're using Redis as a cache, use Redis with encryption at rest (AWS ElastiCache supports this). Don't cache PHI in plaintext.
In memory: You can't encrypt RAM, but minimise how long PHI lives in memory. Don't hold transcript strings in long-lived objects or module-level caches.

The Operational Stuff That's Easy to Forget

Beyond the technical controls, HIPAA compliance involves operational requirements that engineering often doesn't own but absolutely needs to support:

Right of access: Patients have the right to access their PHI. If your AI feature generates notes or summaries, you need a way to export those per-patient.

Right to deletion: Under certain conditions, patients can request deletion of their PHI. Your data model needs to support this — make sure AI-generated content is linked to patient records and can be purged.

Breach notification: If PHI is exposed (including via an AI API breach), you have 60 days to notify affected individuals. You need to know, at query time, which patients' PHI was sent to which vendors. The audit logs make this possible.

Building AI features in healthcare is genuinely interesting work. The constraints aren't obstacles — they force you to think carefully about data flows, minimal disclosure, and what you're actually storing and why. That discipline makes for better software even outside the compliance context.

Building HIPAA-Compliant AI Features: What the Tutorials Skip

The First Question: Can You Send PHI to This API at All?

De-identify Before You Can, Send Minimum Necessary When You Must

Audit Logging is Not Optional

Don't Store LLM Inputs and Outputs by Default

Encryption: In Transit and At Rest

The Operational Stuff That's Easy to Forget

Related Posts

Why Your LLM App Feels Slow (And It's Not the Model)

The Hidden Cost of Context Windows: Managing Tokens in Production

TypeScript Patterns I Actually Use in Production AI Apps