Benchmarks: Answer 99.16% of DocVQA Without Images in QA: Agentic Document ExtractionRead more

Versioning Extraction Schemas in Production Document Pipelines

Share On :

How to version and manage extraction schemas in production document pipelines, enabling updates to extraction logic without breaking downstream consumers.

Extraction schemas are production configuration artifacts. A schema change that adds a required field, modifies a field type, or changes an enum constraint can break downstream consumers if deployed without coordination.

Managing schema changes with versioned, staged, migrated discipline prevents production incidents in document workflows.

Why Schema Versioning Matters

A production pipeline has at least two consumers of the extraction output: the code that calls the Extract API and the downstream system that receives the structured data. Schema versioning creates a coordination mechanism: downstream consumers know which schema version produced an output, can handle both old and new formats during a migration window, and can confirm readiness before the new schema is fully promoted.

Storing Schemas as Versioned Code

The extraction schema is a JSON Schema file that should be stored and version-controlled alongside application code, not constructed at runtime. Schemas can be validated interactively against sample documents in the Schema Wizard Playground before committing to production. Key practices:

  • One schema file per document type. A separate versioned file for each document type makes changes explicit and traceable.
  • Semantic versioning. A nullable field addition is a minor change; a field type change or required field addition is major.
  • Schema registry. For organisations with multiple teams sharing schemas, a central registry with tagged releases enables consistent adoption.

The Safe Migration Pattern

Schema changes that affect downstream consumers follow a four-step migration:

  1. Add new fields as nullable. Deploy the updated schema with new fields marked nullable; existing consumers receive null for the new fields and continue operating.
  2. Confirm extraction. Verify the new nullable fields extract correctly from production documents.
  3. Update downstream consumers. Update all consumers that need the new fields to handle them.
  4. Promote to required. After confirming adoption, update the schema to mark the fields required.

Model Version Pinning as Schema Dependency

The extraction model version is a dependency of the schema; a schema designed against extract-20260314 may behave differently against a future version. Production schemas should pin the model version in the API call.

See extraction model versions for the changelog and testing guidance.

FAQ

Is it safe to update an extraction schema in production without staging? No. Schema changes that add required fields, change types, or modify enum constraints can silently break downstream consumers; the safe pattern is to add new fields as nullable first, deploy extraction, confirm downstream consumers handle the new fields, then promote to required.

How should schemas be shared across multiple teams or services? Store schema files in a shared code repository with versioned tags; teams reference a specific tagged version so they know exactly which schema produced an output and can be updated on a coordinated timeline. See extraction schema best practices for design guidance.

Does changing the extraction model version require schema changes? Not necessarily, but model version changes can affect confidence score distributions and null return rates. Treat a model version upgrade as a migration event: test the existing schema against the new version on a held-out document set before promoting.

See extraction model versions.

What is the correct handling for a field changing from string to typed object? Add the new typed object field under a new name while keeping the old string field deprecated, update consumers to read the new field, confirm adoption, then remove the old field in a subsequent schema version. This avoids a breaking change at any point in the migration.

How do nullable fields work in ADE's extraction model? Using the extract-20251024 model, fields not found in the document return explicit null values rather than omitted keys; see extraction model versions for null handling by version. Explicit nulls are the foundation of safe schema migration: new nullable fields return null on existing documents without breaking downstream consumers.