Retention Policies in Document Processing Systems

May 13, 2026

Share On :

How Zero Data Retention eliminates LandingAI-side source document retention and how to design customer-side data lifecycle policies for document AI pipelines.

Enterprise document AI workflows involve two distinct retention questions: how long source documents are retained, and how long extracted data is retained. ADE's Zero Data Retention option eliminates LandingAI-side source document retention entirely; customer-side retention policy governs everything else.

At production scale, a global Tier-1 bank reduced manual document review time by 40-60% using ADE across 200-300-page multi-lingual KYC packages (bank case study).

The Two Retention Scopes

Document extraction pipelines have two retention scopes requiring separate policy decisions:

Source document retention. How long original PDFs, images, or scanned documents are stored and where. This is governed by the customer's own storage infrastructure and applicable records retention regulations (GDPR Article 5, HIPAA, financial services record-keeping rules).
Extracted data retention. How long the structured data extracted from documents is retained in downstream systems. This is governed by the same regulatory frameworks as the source documents plus any operational data retention policies.

ADE's processing step sits between these two scopes. With ZDR enabled, the source document and its parsed output are never written to LandingAI storage; processing occurs in memory and extracted output is returned to the calling application.

Zero Data Retention: What It Covers

ZDR scope covers in-memory processing of source documents (not stored on LandingAI or sub-processors), parsed output (not stored after the API response is returned), and a guarantee that customer documents are not used for model training. ZDR does not eliminate retention obligations on the customer side; the calling application is responsible for its own retention and deletion policies.

For Parse Jobs with ZDR, parsed results are written to a customer-provided presigned URL in customer-controlled storage; the customer controls retention from the moment output is written.

Designing Customer-Side Retention for Extracted Data

Extracted data inherits the sensitivity category of the source documents; financial data extracted from bank statements is subject to the same retention rules as the bank statements themselves, and PHI extracted from medical records is subject to HIPAA retention requirements.

Practical retention design for extracted data:

Tag extracted data records with the source document reference and extraction timestamp at write time.
Apply the same retention schedule to extracted data as to the corresponding source documents.
For data derived from documents subject to litigation holds, ensure the hold applies to both the source document and the derived extracted data.

Compliance Documentation

SOC 2 Type II certification, HIPAA BAA availability, and VPC deployment options are documented at the Trust Center. For institutions requiring that source documents never transit third-party infrastructure, VPC deployment runs ADE within the customer's own cloud environment.

See security and privacy documentation for the full compliance posture.

FAQ

Does Zero Data Retention mean LandingAI never sees the document content? With ZDR enabled, documents are processed in memory on LandingAI infrastructure and content is not written to any storage system at LandingAI or its sub-processors. For institutions requiring that document content never transit third-party infrastructure at all, VPC deployment is the appropriate path; see the enterprise contact page to discuss VPC deployment options.

Who is responsible for sub-processor ZDR in a VPC deployment? In VPC deployment, ADE runs within the customer's own cloud environment; LandingAI is not responsible for ZDR related to the customer's own infrastructure or any sub-processors the customer integrates. See ZDR documentation for the full scope and responsibility boundaries.

Does extracted data inherit the same regulatory retention schedule as source documents? From a regulatory compliance perspective, extracted data containing regulated information (PHI, PII, financial records) is generally treated as a derivative of the source document and subject to the same retention and deletion obligations; organisations should confirm their specific schedule with legal counsel.

Can Zero Data Retention be enabled selectively for specific document types in the same account? ZDR is an organisation-level setting; it is not configurable at the individual request level within a single organisation. Organisations requiring ZDR for some document types and not others should use separate LandingAI organisations; see ZDR documentation for current configuration options.

How should extracted data be handled when a source document is subject to a deletion request? Deletion of the source document does not automatically delete extracted data in downstream systems. Organisations processing personal data must implement a deletion propagation workflow that traces source document deletion to all downstream records derived from it, using the document reference stored alongside the extracted data at write time.