DocVQA Benchmark: 99.16% Accuracy Using ADE’s Parsed Output Alone Read More
Pricing Choose a platform to continue

arrow icon

Agentic Document Extraction
A new suite of agentic vision APIs — document extraction, object detection, and more.

Right image

arrow icon

LandingLens
An end-to-end, low-code platform to label, train, and deploy custom vision models.

Right image

Login Choose a platform to continue

arrow icon

Agentic Document Extraction
A new suite of agentic vision APIs — document extraction, object detection, and more.

Right image

arrow icon

LandingLens
An end-to-end, low-code platform to label, train, and deploy custom vision models.

Right image

Start for Free Choose a platform to continue

arrow icon

Agentic Document Extraction
A new suite of agentic vision APIs — document extraction, object detection, and more.

Right image

arrow icon

LandingLens
An end-to-end, low-code platform to label, train, and deploy custom vision models.

Right image

Breakthrough Table Extraction with Document Pre-trained Transformer:

TL;DR

LandingAI’s latest Document Pre‑trained Transformer (DPT‑2) parses large, complex tables without hallucinations or misalignment. Complexities such as merged cells, multi‑level headers, and nested structures are handled by predicting the table’s layout and then extracting each cell individually, linking every result back to its location on the page. This cell‑level grounding reduces errors and speeds up extraction. In this post, we’ll explore why table extraction is so hard, what makes DPT‑2 different, and how you can start using it today.

Introduction

Try to copy a large, complex table from a PDF into Excel and you’ll likely find the rows and columns all jumbled. Tables compress multi‑dimensional data into grids, yet most OCR tools flatten them into text because that’s how they’re designed. Real documents complicate things further with irregular layouts, nested cells, and mixed typography. When financial analysts or doctors need exact values from reports, even a small misalignment can break downstream calculations. Large language models can appear helpful in this context, but they’re built to process linear sequences and often hallucinate or misinterpret visual structures. That’s why LandingAI built DPT‑2, a specialised vision model that sees a table’s geometry, understands merged cells, and preserves every relationship between rows and columns. By integrating this model into an agentic workflow, ADE (Agentic Document Extraction) orchestrates multiple steps to deliver accurate, traceable results.

Table structure prediction & cell‑level grounding

Figure 1: DPT‑2 model first extracts the table structure, rows, columns, merged cells and so on

Figure 2: Complex tables are broken down into smaller and easier parts/sections where it is easier to then extract the relevant information from just one part of the table.

DPT‑2 introduces table structure prediction, which means it doesn’t just read the numbers – it maps the table layout itself. The model identifies where rows, columns and merged cells begin and end, then breaks the table into smaller, manageable regions. By understanding the layout first, DPT‑2 reduces common errors like shifted cells and hallucinated rows. Once the structure is known, each cell’s contents are extracted and paired with a bounding box. This cell‑level visual grounding lets you trace any value back to its exact location on the page. And because each region is processed independently, the system can parallelise extraction, making it faster than previous versions. Even if the table borders aren’t visible or blurred, the model performs well on predicting the structure.

Example: Large table extraction

A financial analyst might receive a PDF sales report with thousands of cells. In the past you might have tried copying and pasting row by row, only to spend hours cleaning up misaligned data. With DPT‑2 model, you can parse the entire page in seconds and receive an HTML table that preserves every cell. From there it’s a one‑click import into Excel or Google Sheets, and your data is ready for pivot tables and charts.

Example: Patient lab report

Figure 3: An example lab report uploaded to ADE playground where the model retrieves the exact value from the table and visually grounds the glucose value back to the source PDF.

Healthcare providers often need to pull specific lab results from a patient’s report. Consider a comprehensive metabolic panel containing dozens of metrics. Using ADE’s online playground, you can drag and drop the PDF and instantly view the reconstructed table. A quick question like “What is the patient’s glucose level?” returns the correct value and highlights the exact cell it came from. This grounding builds trust and eliminates guesswork, especially when decisions depend on precise values.

Example: Scanned or handwritten tables

Figure 4: ADE Playground showing an input scanned PDF consisting of a table with handwritten values. The model outputs a reconstructed structured representation which is being previewed.

Many real‑world documents, such as construction timesheets or historical records, are scanned or handwritten. These tables often include multi‑level headers and inconsistent spacing. DPT‑2 is robust to these imperfections: it can detect the table’s layout even when lines are faint or skewed. The model then outputs a structured spreadsheet with clearly typed headers and aligned rows, preserving all the underlying data. In practice this means you can digitise old forms without manual re‑entry and still trust that every number is where it belongs.

Technical walkthrough: getting started

Ready to try DPT‑2 yourself? The easiest way is through the landingai‑ade Python SDK. After installing the library from PyPI, you can parse documents using the new model:

from landingai_ade import LandingAI ADE

# Instantiate the ADE client with your API key (you can also set this as an environment variable)
client = LandingAIADE(api_key="<your_API_key>")

# Parse the document using the latest DPT‑2 model.  You can also supply a local file via the `document` parameter.

response = client.parse(

    document_url="/path/to/your/document.pdf",

    model="dpt-2-latest",

)

# Extract tables from the response

for chunk in response.chunks:

    if chunk.type == "table":

        # Each table is provided as Markdown or HTML

        html_table = chunk.markdown

# Access cell‑level bounding boxes for visual grounding

for identifier, grounding in response.grounding.items():

    if grounding.type == "tableCell":

        print(identifier, grounding.page, grounding.box)

This snippet downloads a PDF, parses it with DPT‑2 and prints the HTML markup and bounding box for each cell. The response.chunks list contains Markdown or HTML for each visual element, and the grounding dictionary maps identifiers to page numbers and bounding boxes. For large documents, use the split=”page” option to process page by page. When you’re ready for production, the API supports asynchronous processing and schema‑driven extraction as well.

Conclusion

Extracting tables from real documents has long been a pain because layouts are unpredictable, and generic models treat everything as plain text. LandingAI’s DPT‑2 changes this by learning the structure of tables themselves, segmenting them into rows, columns, and merged cells, and grounding each value back to its origin. The result is faster, more accurate extraction and confidence that your data is trustworthy. Whether you’re analysing financial reports, pulling lab results, or digitising handwritten records, ADE can save you hours of cleanup while preserving the context of your data. Give the new model a try through the SDK or playground, and start building agentic workflows that extract meaning from documents containing more than just text.