Pricing Choose a platform to continue

arrow icon

Agentic Document Extraction
A new suite of agentic vision APIs — document extraction, object detection, and more.

Right image

arrow icon

LandingLens
An end-to-end, low-code platform to label, train, and deploy custom vision models.

Right image

Login Choose a platform to continue

arrow icon

Agentic Document Extraction
A new suite of agentic vision APIs — document extraction, object detection, and more.

Right image

arrow icon

LandingLens
An end-to-end, low-code platform to label, train, and deploy custom vision models.

Right image

Start for Free Choose a platform to continue

arrow icon

Agentic Document Extraction
A new suite of agentic vision APIs — document extraction, object detection, and more.

Right image

arrow icon

LandingLens
An end-to-end, low-code platform to label, train, and deploy custom vision models.

Right image

Introducing Parse Jobs API for ADE: The Heavy-Duty API for Large Files

Ava Xia

Imagine your application needs to process data from a massive 1000-page contract or a 1-GB engineering report scanned into a single PDF. The upload completes, but now the parsing step grinds everything to a halt, blocking your app, slowing users down, and backing up your processing queue.

Developers working with real-world documents know this pain all too well. Large files can easily overwhelm synchronous workflows, causing timeouts and bottlenecks that ripple through your entire pipeline.

That’s why we built the Parse Jobs API for LandingAI’s Agentic Document Extraction (ADE). It’s a purpose-built, asynchronous API designed specifically to handle large, complex documents without compromising throughput or reliability.

Instead of waiting on a single, long-running call, you simply submit your document, immediately receive a job_id, and then asynchronously poll the job status until your results are ready. By default, the API uses our latest parsing model (DPT-2), but you can pin a specific model snapshot for reproducibility.

The Parse Jobs API is the ideal solution for:

  • Parsing large PDFs (up to 1,000 pages).
  • Handling high-volume batch ingestion with ease.
  • Securing your data with enterprise-grade Zero Data Retention (ZDR) options.

For the complete workflow, visit the GitHub repository: Parse_Jobs_API_for_Large_Files.

What is Job-Based Parsing?

To better understand how the ADE Parse Jobs API differs from our standard parsing approach, let’s use a coffee shop analogy.

  • Standard parsing: You go to the counter, order, and wait right there until the barista hands you your drink. You can’t do anything else. If there’s a long line or a complex order, you’re stuck waiting.
  • Job-based parsing: You order at the counter, and they give you a pager. You’re now free to find a seat or chat with a friend. When your coffee is ready, the pager buzzes, and you go pick it up.

The ADE Parse Jobs API works like the pager system. You submit a job (your “order”), and the API immediately gives you a job_id (your “pager”). Your application is now free to do other things. You can use the job_id to check the status of your request later and retrieve the results once it’s “ready.”

Why Do You Need It?

The standard ADE Parse API is great for small files. But when you’re dealing with an 800-page technical manual or a 500MB scanned agreement, use the ADE Parse Jobs API to handle these heavy-duty jobs.

Here’s an overview of the differences between our two parse APIs.

FeatureParse APIADE Parse Jobs API
Max File Size50MB1GB
Max Pages501,000
ResponseImmediate resultsImmediate Job ID
Best ForSmall documents Large documents

How to Use the ADE Parse Jobs API

The standard workflow is a simple three-step process: submit, monitor, and retrieve.

If you’re interested in seeing how the entire process comes together, take a look at the full workflow on GitHub: Parse_Jobs_API_for_Large_Files.

Step 1: Submit the Document

First, you send your document to the API. This is a POST request to the /v1/ade/parse/jobs endpoint. The key here is that the API doesn’t make you wait for the processing to finish. Instead, it immediately accepts the job and returns a unique job_id.

It’s crucial to save this job_id, as you’ll need it for the next steps.

import requests
import json
headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}
url = 'https://api.va.landing.ai/v1/ade/parse/jobs'
# Upload a document
document = open('your_document.pdf', 'rb')
files = {'document': document}

response = requests.post(url, files=files, headers=headers)
print(response.json()) # Th response will give you the job_id

Step 2: Monitor Job Status

Now that you have a job_id, you can periodically check on its status. This is done by making a GET request to the /v1/ade/parse/jobs/{job_id} endpoint. A good practice is to poll every 15-30 seconds.

The response will tell you the job’s status (e.g., processing, completed, or failed) and its progress as a percentage.

url = f'https://api.va.landing.ai/v1/ade/parse/jobs/{job_id}'
headers = {'Authorization': f'Bearer {api_key}'}

try:
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        status = data.get('status')
        progress = data.get('progress', 0) * 100
        print(f"Status: {status} | Progress: {progress:.0f}%")
        return data
    else:
print(f"❌ Error checking status: {response.status_code}")
return None
except Exception as e:
  print(f"❌ Error: {e}")
  return None

Step 3: Retrieve the Results

Once the job status is completed, it’s time to get your parsed data. The API provides results in two ways to optimize performance:

  • For small files (< 1 MB): The final markdown content is included directly in the status response object, inside the data field.
  • For large files (≥ 1 MB): The status response will contain a temporary, secure output_url. You simply make a GET request to this URL to download a JSON file containing the results.

Think of it like getting mail. A small letter fits in your mailbox (data field), but for a large package, you get a pickup slip (output_url).

Here’s a sample request:

import requests
headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}
url = f'https://api.va.landing.ai/v1/ade/parse/jobs/{jobId}'

response = requests.get(url, headers=headers)
response_data = response.json()

# Check if job is completed
if response_data.get('status') == 'completed':
    # If file size is <= 1MB,  markdown content is available in data
    if 'data' in response_data and response_data['data'].get('markdown'):
        markdown_content = response_data['data']['markdown'] 
    # If file size is > 1MB, output_url is available instead
    elif response_data.get('output_url'):
        print("Use the Markdown file specified in `output_url`.")  
    else:
        print("No Markdown content or `output_url` found in the completed job response.")
else:
    print(f"\nJob status: {response_data.get('status', 'unknown')}.")

Zero Data Retention (ZDR)

For organizations operating under stringent security and compliance mandates, including HIPAA/BAA scenarios, ADE offers a Zero Data Retention (ZDR) feature. 

If ZDR is enabled for your account, you must include the following parameters when using the ADE Parse Jobs API:

  • document_url: Your document must be accessible via a public URL, such as a pre-signed URL from your cloud storage like S3, Azure Blob, or GCS. This allows ADE to fetch the document directly from your storage.
  • output_save_url: To protect sensitive data, ADE requires a pre-signed output URL from your cloud storage. This temporary, secure URL lets the API write parsed results directly to your storage, so no parsed content is ever stored or returned by LandingAI. 

When you have ZDR enabled and you include these parameters, LandingAI’s systems will fetch the document, process it, and upload the final result directly to your cloud storage. This eliminates the need for you to poll for status or handle the download yourself.

Here’s a sample request:

import requests
import json
headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}
url = 'https://api.va.landing.ai/v1/ade/parse/jobs'
# Prepare the request payload
output_save_url = generate_presigned_url(...)
files = {'document_url': 'https://...', 'output_save_url': output_save_url}

response = requests.post(url, files=files, headers=headers)
print(response.json())

Conclusion

With LandingAI’s ADE Parse Jobs API, you don’t need to manage your own parallelism or job orchestration — the server does it for you. Each document request runs asynchronously, allowing your applications to scale effortlessly while staying responsive. No thread pools, no message queues — just clean, reliable async processing handled entirely by LandingAI’s infrastructure in the following workflow:

  1. Submit a document for parsing — you’ll immediately receive a job ID.
  2. Poll the job status until it’s marked complete.
  3. Retrieve the parsed result — either as inline Markdown or through an output_url for larger files.

And when Zero Data Retention (ZDR) is enabled, you can achieve enterprise-grade privacy and compliance while maintaining the same frictionless developer experience. The ADE Parse Jobs API lets you scale securely, efficiently, and without ever having to configure parallelism yourself.

Ready to get started? Check out the API references, and pull patterns from our GitHub repo. For custom pricing and deployment solutions tailored to your enterprise needs, don’t hesitate to talk to Enterprise Sales.