Extracting Structured Data from Medical Referral Letters

May 13, 2026

Share On :

How ADE extracts diagnoses, physician names, procedures, and clinical context from highly variable referral letter formats without specialty-specific templates.

Referral letters are among the least structured documents in clinical workflows; every physician writes them differently, and a referral from a GP to a cardiologist may be a formal one-page letter, a handwritten note, a dictated narrative, or a discharge summary excerpt. ADE handles all of these through a single pipeline without per-physician or per-specialty configuration.

Eolas Medical processes over 100,000 clinical guidelines with ADE to support 1.2 million monthly queries from healthcare professionals; at Tier-1 bank scale, the same architecture reduced manual document review time by 40-60% across 200-300-page multi-lingual client packages (bank case study).

What Makes Referral Letters Hard to Process

Unlike claim forms or intake questionnaires, referral letters have no standardised layout; the same clinical information appears in different positions, under different headings, and in different prose styles depending on the referring physician, specialty, institution, and jurisdiction. Template-based extraction fails because there is no consistent template to build from.

OCR-plus-LLM stacks flatten the letter to text and pass it to a language model. For short, well-formatted letters this works adequately.

For letters with embedded tables (current medications, lab results), handwritten annotations, institutional letterheads, or mixed typography, flattening loses the structural context that makes the clinical information interpretable.

What ADE Extracts from Referral Letters

The extraction schema for a referral letter defines the clinical fields the receiving team requires:

Parties. Referring physician name and contact, receiving physician or facility, patient name, date of birth, and patient identifier.
Clinical summary. Primary diagnosis or reason for referral, relevant history, current medications, and known allergies.
Requested action. Procedure or consultation type requested, urgency level, and preferred timeframe.
Supporting findings. Recent lab values, imaging results, and relevant test findings referenced in the letter.

Field descriptions in the schema guide the extraction model to locate these fields regardless of the prose style or document structure of the referring physician. A well-described reason_for_referral field extracts the correct content whether it appears under a "Referral Reason" heading, in an opening paragraph, or embedded in a narrative summary.

Handling Handwritten and Dictated Referrals

A significant portion of referral letters in active clinical use are handwritten or generated from physician dictation transcripts. ADE's Document Pre-Trained Transformer architecture treats documents as visual systems, parsing handwritten text using the same visual reasoning as printed text.

Scan quality and handwriting legibility affect extraction certainty, which is reflected in confidence scores that route uncertain extractions to administrative review.

Traceability for Clinical Workflows

Every field ADE extracts includes chunk_references linking it to the parsed chunk that sourced the value, with page number and bounding-box coordinates. A clinician reviewing an extracted diagnosis can navigate to the exact paragraph in the referral letter where it appeared.

This grounding supports clinical audit requirements and reduces the risk of acting on misattributed clinical data.

HIPAA and Data Security

Referral letters contain PHI and are subject to HIPAA. Zero Data Retention ensures letter content is processed in memory without storage on LandingAI infrastructure; BAA availability and SOC 2 Type II certification are documented at the Trust Center.

VPC deployment is available for health systems requiring that clinical correspondence not transit third-party infrastructure.

FAQ

How does ADE extract structured fields from a referral letter with no headings or structure? The extraction schema operates on the full parsed Markdown of the letter. Field descriptions in the schema guide the extraction model to locate the right content in unstructured prose: a reason_for_referral field with a description explaining it contains the clinical reason the patient is being referred extracts the correct text even when it appears in a flowing narrative with no section label.

See extraction schema documentation for field description guidance.

Can ADE extract medications listed as a table in the referral letter? Yes. Tables in the referral letter are preserved as structured table chunks in the parsed output, with each row and cell grounded to its page location.

A current_medications array field in the schema extracts medication names and dosages from tabular or list formats. Confidence scores for table-extracted fields currently return null as this is an experimental feature in active development.

Does ADE understand clinical terminology and abbreviations? ADE's extraction layer uses the schema's field descriptions to understand what to extract. Well-written field descriptions that specify clinical context: for example, "the ICD code or plain-language diagnosis stated as the primary reason for referral"; produce accurate extractions across the range of abbreviations and terminology styles different physicians use.

How does ADE handle a referral letter that was photographed rather than scanned? ADE accepts image formats through the same Parse API as PDFs. Photo quality and angle affect extraction certainty; low-certainty fields surface in confidence scores that route them to manual review.

Heavily distorted or out-of-focus photographs produce lower-quality parsed output regardless of the extraction platform used.

Is there a minimum structure a referral letter needs to enable extraction? No minimum structure is required. ADE parses unstructured prose letters the same way as structured ones.

Very short informal referrals (a few sentences) may produce null returns for fields the letter does not address, which explicit null handling in the schema represents as absent rather than unknown.