Benchmarks: Answer 99.16% of DocVQA Without Images in QA: Agentic Document ExtractionRead more

ADE Extract API: Schema-Driven Extraction for Long, Complex Documents

LandingAI Team

LandingAI Team

Share On :
ADE Extract API: Schema-Driven Extraction for Long, Complex DocumentsADE Extract API: Schema-Driven Extraction for Long, Complex Documents
Product

TL;DR

Production document extraction systems fail predictably. They fail when the files get long, the schema gets large, tables span pages, field labels shift across suppliers, and nobody can verify where a value came from.

The extraction APIs in Agentic Document Extraction (ADE) are built for that production reality. Each API works at a different phase in the pipeline:

  • Build time: Use the Schema Building API to generate or refine a master schema from representative documents.
  • Inference time: Use the Extract API to apply the schema to a document. The API returns structured data with chunk reference IDs linking every extracted value back to the parsed document.

There's no need to manually split documents or maintain brittle templates. You also get a clear audit trail when something needs verification.

The Problem: Why Extraction Breaks in Production

A procurement team processes invoices from 30 suppliers. Each one has different formatting — different field names, different table layouts, different date formats. One labels a column "Payment Terms," another calls it "Net 30," another buries it in a footnote. The extraction returns data, but the pipeline still breaks because every supplier is a special case.

The extraction APIs in ADE are designed for that reality.

Agentic Extraction Capabilities

The ADE Extract APIs are built for the complex documents that would otherwise break extraction in production. They preserve the full structure, handle vendor variation, and process at scale without the code stitching and manual workarounds that most pipelines require.

  • Infinite Schema (10+ Levels Deep)
    Most extraction APIs flatten or truncate past 2-3 levels of nesting. The ADE Extract API returns a full document hierarchy as nested JSON — invoices, contracts, financials with all sub-structures intact. No data loss from deep structures.

  • Master Schema
    Most schemas built from one sample break on the next vendor's format. The ADE Build Extract Schema API ingests multiple documents and produces one unified schema that handles field and layout variation from day one. One schema, all document variation.

  • Cross-Page Table Reconstruction
    In other pipelines, tables spanning page breaks return as disconnected fragments. The ADE Extract API reconstructs multi-page tables as one array — headers, rows, and columns intact. Zero post-processing stitching.

  • Long document support
    Most tools degrade past 50-100 pages, forcing chunking and overlap reconciliation. One API call processes the full document — no splitting, no stitching. Full document, single call. Learn more.

  • Schema Drift Detection
    Vendor format changes silently break pipelines — you find out in production. Run the API against updated documents to surface new or changed fields before they reach your pipeline. Version forward, not debug backward. Catch breaks before production. Learn more.

  • Semantic Field Matching
    Exact string matching fails when vendors label the same field differently. The ADE Extract API matches on meaning — "Amount Due," "Grand Total," and "Balance Owed" all resolve to a single, consistent field in your schema — no matter what your vendors call it.

One Workflow, Two Phases

The API handles schema building and extraction as two separate steps. You define the schema once, then extract against it repeatedly.

One workflow, two phases

Here are the APIs for an extraction workflow, from representative documents to schema building, and extraction:

PhaseEndpointWhat you sendWhat you get back
Build timePOST /v1/ade/extract/build-schemaParsed documents, a prompt, an existing schema, or any combinationAn extraction schema and metadata
Inference timePOST /v1/ade/extractOne parsed Markdown document and a schemaThe extracted key-value pairs and metadata

Example: A Procurement Team Processing 30 Suppliers

Here's an example of how one workflow plays out end-to-end.

Build time. A procurement team feeds a representative batch of invoices into the ADE Build Extract Schema API. The schema builder generates a master schema across all 30 suppliers, mapping variations like "Payment Terms," "Net Days," and unlabeled footnotes to unified fields through alternative labels. The API sets formatting guidance so dates, currencies, and quantities normalize, regardless of how each supplier formats them.

Inference time. New invoices arrive. A 40-page contract with line items spanning five pages returns as one unified table.

Six months later. A new supplier shows up. The team feeds the following into the ADE Build Extract Schema API: the new invoices, their existing schema, and a prompt to identify what changed. The API proposes updates. They review, approve, and version. The pipeline keeps running.

Try Your Documents Today

Test Your Document in the Playground

Try out your document on va.landing.ai for quick validation and testing.

Start Building With the API

Resources