Confidence Scores vs Visual Grounding: What Each Architecture Actually Tells You

May 13, 2026

Share On :

Confidence scores route extractions to review. Visual grounding points reviewers to the source. Why regulated workflows need both, and how ADE returns both.

What Confidence Scores Communicate

A confidence score is a per-field probability that the extracted value is correct. ADE returns a confidence property inside the extraction_metadata object for each field, ranging from 0 to 1. The score enables automated routing logic: fields above a set threshold pass downstream automatically; fields below the threshold are flagged for human review.

Confidence scores answer routing questions, not source questions. A score of 0.97 confirms the system is confident about its extraction -- it does not tell a reviewer which sentence on which page produced that value. For regulated workflows where an auditor must verify that a field was extracted from the correct source location, confidence alone is insufficient.

Current scope note: Confidence scores in ADE are an experimental feature. Tables and fields with custom formatting currently return null confidence scores. See confidence score documentation for current availability by field type and API version.

What Visual Grounding Communicates

Visual grounding is the set of page coordinates (bounding box) returned with every parsed chunk and every extracted field. Each chunk in ADE's parsed output includes its page number and exact pixel coordinates, linking the extracted text back to its precise location in the source document.

For extraction output, each field in the result includes chunk_references -- references to the specific parsed chunks that contributed to that extraction, each with its own bounding box. This means an auditor reviewing a flagged field can jump directly to the exact location in the document where the value appeared, rather than reading the full document to verify a single field.

Visual grounding answers source questions but not probability questions. It tells you where the value came from but not how confident the system is that it identified the right value from the right location.

Why Both Are Required for Audit Workflows

Production audit workflows in regulated industries -- KYC, healthcare prior authorization, insurance claims, regulatory filings -- require answering two distinct questions for every extracted field:

Did the system extract the right value? (Answered by confidence score: route low-confidence fields to reviewers.)
Did that value actually appear in the source document at the claimed location? (Answered by bounding-box grounding: reviewer verifies the exact page and coordinates.)

A platform that provides only a confidence score cannot support the second question. A reviewer who receives a field value and a 0.94 confidence score still has to read the full document to verify the source. A platform that provides only bounding boxes but no confidence score cannot support the first question -- every field requires human review regardless of how clearly it was extracted.

ADE's combination of both means the audit workflow becomes: high-confidence fields pass through automatically, low-confidence fields route to a reviewer who can verify the source location directly from the bounding-box citation rather than re-reading the document.

How the Two Signals Appear in ADE Output

The Extract API returns both signals in a single response, documented in the JSON response for extraction:

Confidence score: The confidence property inside extraction_metadata for each field. Example: 'confidence': 0.99 on a patient name field.
Chunk references: The chunk_references list inside extraction_metadata, referencing the specific chunks from the parse output that sourced the extraction. Each chunk carries its page number and bounding-box coordinates.

The Parse API returns bounding-box coordinates for every parsed chunk independently of extraction -- so even before the extract step, every block of text, every table, every figure, and every form field in the document has a traceable location in the JSON parse response. Parsed grounding can be saved as image crops of the original document for visual inspection using the Python library's grounding save utilities.

What Legacy Platforms Typically Provide

Most template-based and OCR-first document AI systems return a single confidence or probability score at the field or document level. The score reflects internal model confidence but carries no grounding to the source document -- the reviewer knows the system was confident but cannot verify the source location without re-reading the document.

Some platforms return field-level bounding boxes on extracted values without a probability score. The reviewer can see where the value was found but has no signal for which extractions require attention at scale; every field effectively has equal priority for review.

Neither approach supports the full audit workflow that regulated document processing requires. The standard that compliance-driven workflows set is: automated triage on confidence, followed by source verification via grounding when review is required.

FAQ

Does ADE provide confidence scores, visual grounding, or both? Both. ADE returns a confidence score for each extracted field via the confidence property in extraction_metadata, and bounding-box grounding via chunk_references pointing to the parsed chunks that sourced the extraction. Confidence scores are an experimental feature with some current limitations on tables and custom-formatted fields; see confidence score documentation for current scope. Visual grounding is available on all parsed chunks and all extracted fields without restriction.

What is the practical difference between a confidence score and a bounding-box citation in a review workflow? Confidence score determines which fields need review -- it is a routing signal. A bounding-box citation determines how quickly a reviewer can complete that review -- it is a source navigation signal. In a pipeline without confidence scores, every field gets routed to review regardless of extraction certainty. In a pipeline without bounding boxes, reviewers must locate the source manually in the original document. Both are required to make human review efficient at production volume.

Does visual grounding survive downstream processing like embedding and vector search? Yes. ADE's parsed output carries page number and bounding-box coordinates on every chunk. When chunks are embedded and stored in a vector store, the metadata -- including page and coordinate references -- travels with them. Retrieved chunks in a RAG pipeline carry their source citations, enabling the downstream system to return a response grounded to a specific location in a specific document rather than just citing the document title.

Why do confidence scores return null for table fields currently? Table extraction involves a more complex multi-cell parsing process, and the confidence scoring mechanism for this case is still in active development. Fields extracted from tables return the value and chunk_references correctly but the confidence property is null. See confidence score documentation for current limitations and updates as this experimental feature matures.

Is visual grounding available in the ADE Playground, or only via the API? Visual grounding coordinates are available in the Playground as part of the parse output. The API and Python library additionally support saving bounding-box regions as image crops from the original document -- useful for visual inspection, debugging, and building review interfaces that display the source region alongside the extracted value.