Turbocharge Document Understanding Apps with Agentic Document Extraction

Ankit Khare April 22, 2025

Overview: What You’ll Learn

Building chat-apps using Agentic Document Extraction Python library
Features of Agentic Document Extraction Python library
Parse PDFs at scale with a single function call—no manual chunking required
Speed up your existing “Chat with PDF” setup through built-in concurrency
Pinpoint every chunk’s bounding box for highlight overlays or snippet extraction
Explore advanced options (visual debugging, parallel processing, cropping)
Decide when you’d still prefer raw REST calls in specialized scenarios

Complete code for the tutorial is available on GitHub — follow along and run the example app yourself👨🏼‍💻.

Figure 1: A simple “Chat with PDF” app. The user has uploaded a PDF and the Agentic Document Extraction Python library has precomputed structured data for each page.

Figure 2: The app’s response to a question, with visually grounded answers. The relevant excerpt on the PDF is highlighted, and the answer is cited with its page reference.

Figure 3: The app also displays the reasoning (chain-of-thought) used to generate the answer, giving transparency into how the conclusion was reached.

1. Introduction

In our previous blog post, we built a “Multi-PDF Chat with Research Papers” application by calling the Agentic Document Extraction API directly with REST. We manually:

Split PDFs into chunks
Managed concurrency and retries
Stitched partial JSON responses
Calculated bounding-box transformations

While this taught us the fundamentals, it was also quite labor-intensive—especially for large or numerous PDFs. Today, we’ll see how we upgraded the same application with the official Agentic Document Extraction Python library, which elegantly wraps this functionality (and more) in just a few function calls. We removed hundreds of lines of “infrastructure” code, gained extra features like bounding-box snippet images and visual debugging, and now parse multi-hundred-page PDFs with minimal effort.

2. Features of Agentic Document Extraction Python Library

Automatic Large File Handling
- Instead of manually chunking a 200-page PDF, you just call parse_documents([pdf_file]). The library splits, merges, and returns a single, consolidated result.
Built-In Parallelism
- Tune concurrency by setting BATCH_SIZE × MAX_WORKERS via .env. No manual ThreadPoolExecutor or concurrency loops needed.
Structured Chunks with Groundings
- Each chunk includes bounding box references (grounding) that pinpoint the exact location of the extracted text. Perfect for highlight overlays.
Retry & Error Handling
- Automatic exponential backoff for 429 (rate limit) or 5xx responses—no custom logic required. If a page can’t be parsed, it’s returned as an “error chunk” instead of crashing your app.
Extras:
- Cropping bounding box regions to disk (grounding_save_dir).
- Visualizing chunk bounding boxes in annotated images (viz_parsed_document).
- Parsing multiple files or even remote URLs in a single function call.

In other words, the library frees us from having to babysit the PDF → JSON pipeline, so we can focus on building features instead of writing boiler plate code.

3. Upgrading Our “Chat with PDF” App

3.1 Before: Manual REST

Recall that we originally:

Split each PDF into multiple 2-page buffers to fit the REST endpoint’s limit.
Parallelized calls with a custom thread pool.
Reconstructed partial JSON results for each chunk.
Transformed bounding box coordinates from PDF to image space.

This worked, but it was cumbersome—especially for 50+ page documents.

3.2 After: One-Call Parsing

Here’s the core of our new approach, replacing all of that chunking logic:


        from agentic_doc.parse import parse_documents



        def parse_pdf_with_agentic(file_stream):

            """

            Uses the agentic-doc SDK to parse a PDF in memory.

            Returns a page-based map of {page_number: [chunks]} for further processing.

            """



            # parse_documents() can handle file-like objects or file paths/URLs

            results = parse_documents([file_stream])

            parsed_doc = results[0]



            page_map = {}

            for chunk in parsed_doc.chunks:

                for grounding in chunk.grounding:

                    # grounding.page is 0-based, let's use 1-based indexing

                    page_idx = grounding.page + 1

                    page_map.setdefault(page_idx, [])



                    # Convert (l, t, r, b) → (x, y, width, height)

                    box = grounding.box

                    x1, y1 = box.l, box.t

                    w, h = box.r - box.l, box.b - box.t



                    page_map[page_idx].append({

                        "bboxes": [[x1, y1, w, h]],

                        "captions": [chunk.text],  # the extracted text

                        # You can store chunk.chunk_type or chunk.chunk_id if needed

                    })



            return page_map

That’s it. No explicit concurrency, no partial chunk merges—the library handles all that under the hood.

4. Working with the SDK’s Advanced Features

4.1 Parallelism & Error Handling

Parallelism is set via environment variables:


         BATCH_SIZE=4 # parse up to 4 files at once
        

        MAX_WORKERS=5 # each file gets up to 5 parallel requests

Retries for rate-limits or network errors are automatic. Configure them in .env:


        MAX_RETRIES=80

        MAX_RETRY_WAIT_TIME=30

        RETRY_LOGGING_STYLE=inline_block

If the library can’t parse certain pages, you’ll get chunk_type=error in the result. This means you can skip or handle them gracefully in your app.

4.2 Cropping Groundings to Disk

If you want small PNG images of each chunk’s bounding box, just specify grounding_save_dir:


        results = parse_documents(
        

            [pdf_file],
        

            grounding_save_dir="my_crops"
        

        )
        

        for chunk in results[0].chunks:
        

           for grounding in chunk.grounding:
        

              if grounding.image_path:
        

                 print("Saved snippet to:", grounding.image_path)

Now each bounding box region is extracted as a separate image. This is super helpful for debugging table extraction or verifying text alignment.

4.3 Visualization

Instead of manually writing bounding-box overlay code, you can generate annotated pages with one function:


    from agentic_doc.utils import viz_parsed_document

    from agentic_doc.config import VisualizationConfig, ChunkType



    images = viz_parsed_document(

        "some_doc.pdf",

        results[0],  # The ParsedDocument

        output_dir= "viz_output",

        viz_config=VisualizationConfig(

            thickness=2,

            font_scale=0.8,

            # Color certain chunk types differently

            color_map={

                ChunkType.TITLE: (255, 0, 0),   # red

                ChunkType.TABLE: (0, 255, 0),  # green

            }

        )

    )

You’ll get one annotated image per PDF page, with color-coded boxes and chunk labels. Perfect for visually confirming the extraction accuracy.

4.4 Batch Parsing

Need to parse many PDFs in a batch? Use parse_and_save_documents(), which:

Processes each file in parallel (up to your BATCH_SIZE).
Writes the results to JSON on disk, automatically naming them.


    from agentic_doc.parse import parse_and_save_documents



    pdf_list = ["doc1.pdf", "doc2.pdf", "https://example.com/doc3.pdf"]

    json_paths = parse_and_save_documents(pdf_list, result_save_dir="parsed_results/")

    print("Saved JSON to:", json_paths)

5. Integrating with Our “Chat with PDF” Flow

We still do the same question-answer approach:

Extract PDF text + bounding box data with agentic-doc.
Feed this structured JSON to GPT, instructing it to return “best chunks” that support its answer.
Highlight those bounding boxes on each page.

But now we’ve eliminated all the custom chunk management code—the library has done it for us. This dramatically reduces the chance of subtle concurrency bugs or partial merges failing.

6. When Might I Still Use Raw REST?

While the SDK is recommended for nearly all Python-based projects, there are a few cases where you might still prefer direct REST calls:

Non-Python Ecosystem: If your app is written in Node.js, Go, or another language, you won’t have the Python SDK.
Ultra-Lightweight Environments: For extremely resource-constrained serverless functions (where you can’t install additional libraries), a simple requests.post() might suffice.
Custom Experimental Endpoints: If you need to try brand-new or experimental parameters not yet exposed by the SDK. In that scenario, you might manually call the endpoint with the advanced parameter set.

That said, for any Python project—especially if you value concurrency and ease of code maintenance—the SDK is a clear winner.

7. Results and Performance Gains

In our tests, replacing the old REST approach in the same “Chat with PDF” app:

Shrank our “PDF parsing” code by ~70%.
Reduced parse time for large PDFs (100+ pages) by up to 50% due to built-in concurrency that’s smarter about partial merges and retry.
Improved reliability. With automatic backoff and consistent error reporting (including error chunks), we saw far fewer “failed parse” issues.

We also love the built-in grounding_save_dir option for debugging tricky layout documents. Whenever a user complains “it missed this figure,” we quickly check the snippet images to see how it was interpreted.

8. Conclusion & Next Steps

With a single library, you can parse visually complex documents, produce visually grounded answers, and free yourself from the overhead of raw REST calls. We hope this saves you a bunch of headaches and a lot of time while developing your applications. If you have questions or feedback, feel free to reach out—happy to help! Also, be a part of the best Visual AI discord community and hangout with us there.

Important Resources

Sign up for the platform: Landing AI Platform

Example App GitHub Repo: Full source code for the Streamlit QA app

Agentic Document Extraction Python Library Docs: Repository that contains python library code

Get your API key: Steps to get your LandingAI API key

Rest API doc: Documentation for REST API