TL;DR
Hey, just a test.
Most companies pay $12.88 to process a single invoice. The best pay $2.78 (Ardent Partners, 2025). The difference isn’t better software — it’s a better workflow.
Here’s what that workflow looks like: Landing AI’s Agentic Document Extraction (ADE) reads your invoices — regardless of format, vendor, or language — and turns them into structured data (that’s real IDP — intelligent document processing for you in 2026). Then rules match each invoice to the right purchase order, flag the ones that don’t fit, and push clean matches toward payment. Your ERP stays exactly where it is. You’re not replacing anything. You’re adding an automation layer on top.
Nitish Dandu built this as an open-source system — email intake, LandingAI ADE for extraction, dual-tolerance PO matching ($1 absolute or 2% relative), exception routing by status, and Stripe-powered payments. Every pattern in this article maps to working code you can read, fork, and deploy.
The Premise
Accounts payable hasn’t changed much in 20 years. An invoice arrives — by email, by mail, sometimes by fax. Someone opens it, reads it, types the data into an ERP, looks up the purchase order, checks the amounts, flags anything that doesn’t match, and eventually schedules payment. 68% of organizations still do this manually (DocuClipper, 2025).
The industry’s answer has been to sell all-in-one AP suites. Rip out what you have, migrate to a new platform, retrain your team. But that’s a massive, expensive bet — and it ignores the fact that most ERPs work fine as systems of record. The problem isn’t where the data lives. It’s what happens between receiving an invoice and paying it.
Meanwhile, a parallel shift has been happening in document AI. Intelligent document processing now extracts structured, validated data from messy invoices at 93–99% accuracy — up from 70–80% just a few years ago with template-based OCR. More importantly, engines like Landing AI’s ADE can handle invoice formats they’ve never seen before — different vendors, different layouts, different languages — without vendor-specific configuration. That opens a different path: instead of replacing your stack, you add an intelligent extraction and workflow layer on top of it.
This article walks through that path step by step — from IDP extraction to PO matching to exception routing to payment processing. Every pattern maps to Nitish Dandu’s open-source implementation, so you can see how each piece works in real code, not in a vendor diagram.
Why Do Most AP Automation Projects Fail Before They Start?
Only 32.6% of invoices are processed without human intervention across all industries. Best-in-class teams reach 49.2% (Ardent Partners, 2025). The gap isn’t extraction accuracy — it’s everything that happens after the invoice is read.
Most companies fall into the suite-replacement trap. They buy a platform that promises end-to-end AP automation, then spend months migrating data, retraining staff, and fighting integration issues with their ERP. Meanwhile, the actual bottleneck — matching invoices to POs, routing exceptions, and tracking payment readiness — stays manual.
The real problem: AP breaks at the workflow layer, not just the capture layer. OCR mostly reads simple invoices fine — but it struggles with complex layouts, spatial localization, and providing chunk locations where it found each output. What’s missing is structured extraction that handles format variety, automated matching, and exception routing that works with your existing systems.
Here’s what the cost gap actually looks like:
Cost Per Invoice: Manual vs. Automated Manual $12.88 Industry Avg $9.40 Best-in-Class $2.78 Source: Ardent Partners, “AP Metrics That Matter” 2025
According to Ardent Partners’ 2025 benchmarking data, the best-in-class cost per invoice of $2.78 represents a 78% reduction from the $12.88 industry average, driven primarily by touchless processing rates above 49% and automated matching workflows (Ardent Partners, 2025).
The fix isn’t buying a bigger platform. It’s layering automation onto your existing stack where it matters most.
What is IDP and Why Does LandingAI Make This Possible?

Modern IDP can achieve roughly 99% field-level extraction accuracy, compared to 85–95% for template-based OCR (Parseur, 2026). But the accuracy number alone doesn’t tell you why traditional OCR fails at invoice processing.
The real problem is format chaos. Invoices come in every layout imaginable. One vendor sends clean PDFs from accounting software. Another sends scanned images of handwritten forms. A third sends faxed documents that have been photocopied multiple times. Some label the invoice number as “Invoice #”, others call it “Inv No.” or “Factura.” Line items might be formatted as tables, as lists, or as unstructured text blocks. Traditional OCR tools might work on one vendor’s format and completely miss critical fields on another’s. They choke on multi-line addresses, on tax calculations that span rows, on tables with merged cells.
This is the problem Landing AI’s Document Processing Platform solves — and it’s why Nitish’s entire system stands on it. Without LandingAI, the downstream automation won’t scale in production workloads.
From the build: LandingAI processes invoices from vendors the system has never seen before and extracts the data correctly. It recognizes field relationships across wildly different templates without any vendor-specific configuration. That zero-shot capability is what makes the automation layer viable — you don’t need to retrain or reconfigure when a new vendor shows up.
In Nitish’s implementation, LandingAI ADE runs a two-stage pipeline:
- Parse — the document is converted into a normalized representation that preserves layout, regardless of whether the input is a clean PDF, a scan, or a photographed document
- Extract — 20+ labeled fields (vendor name, tax ID, line items, totals, PO references, bank details) are pulled into a Pydantic-validated schema
The parse step is what makes the extraction reliable. It normalizes document structure before field extraction begins, so the extract step works from a consistent representation regardless of how messy the original input was.
Every downstream step — vendor matching, PO matching, exception routing, payment processing — depends on getting clean, typed data out of these messy documents. If the extraction layer can’t handle format variety, the entire pipeline breaks. That’s why the choice of IDP engine isn’t a nice-to-have. It’s the foundation.
The Five-Step AP Automation Architecture
Average invoice processing time is 9.2 days across all industries. Best-in-class teams do it in 3.1 days. Manual-heavy teams take 17.4 days (Ardent Partners, 2025). The architecture below — pulled directly from Nitish’s tmc-v1 — targets the best-in-class tier.
Invoice Processing Time (Days) Manual Industry Avg Best-in-Class 17.4 days 9.2 days 3.1 days Source: Ardent Partners, “AP Metrics That Matter” 2025
Here are the five steps, each mapped to working code:
Step 1: Email Intake and Attachment Filtering
A background job polls IMAP every 30 seconds (email_client.py). It filters by sender whitelist, validates file types, caps attachments at 20MB, and skips duplicates. Files are stored in a date/vendor directory structure via storage_local.py.
This isn’t glamorous. But reliable intake is the foundation. If invoices get lost or duplicated at ingestion, nothing downstream can save you.
Step 2: Two-Stage IDP Extraction
LandingAI ADE first parses the document into a normalized representation, then extracts 20+ fields into Pydantic-validated JSON (ocr_landingai.py). The schema in invoice_schema.py covers supplier info, buyer details, financial totals, line items, and PO references. All fields are optional — because OCR will sometimes miss data, and you don’t want a pipeline that crashes on partial extractions.
Step 3: Vendor Matching
The system resolves each invoice to an existing vendor by tax ID or name (case-insensitive). If it can’t match, the invoice gets flagged. If the extracted supplier name conflicts with the resolved vendor, it’s tagged vendor_mismatch. This catches a common real-world problem: invoices that arrive from a subsidiary but reference a parent company.
Step 4: Purchase Order Matching
This is where most AP automation stalls. Nitish’s po_matching.py uses a dual-tolerance algorithm: an invoice matches a PO if the amount difference is within $1 absolute or 2% relative — whichever is larger. It also validates currency and vendor, picks the best match when multiples exist, and assigns a confidence score (1.0 = exact match, 0.0 = threshold match). More on this in the next section.
Step 5: Exception Routing and Payment Processing
Invoices that can’t match get routed by status: unmatched, vendor_mismatch, or needs_review. Matched invoices move to payment processing via Stripe (payments.py), with row-level locking, idempotency keys, and automatic status rollback on failure.
The entire pipeline runs without human intervention for clean invoices. Exceptions surface for review. That’s the goal: touchless processing where possible, clear escalation where necessary.
How Does Purchase Order Matching Actually Work?

14% of invoices require exception handling, and 53% of AP teams say exceptions are their biggest operational challenge (Ardent Partners, 2025). The matching algorithm is what determines whether an invoice flows through or creates work.
Here’s the logic from po_matching.py:
- Look up PO by the
po_numberextracted from the invoice - Filter to POs with status
openorpartially_received - For each candidate, check: does the currency match? Does the vendor match?
- Calculate the absolute amount difference
- Accept the match if
difference ≤ max($1.00, 2% × invoice_total) - If multiple POs match, pick the one with the smallest difference
- Assign confidence:
1.0 - min(1.0, difference / max(1.0, invoice_total))
What does this look like in practice? A $1,000 invoice matches any PO between $980 and $1,020. A $100 invoice matches between $98 and $102. The dual-tolerance approach handles both rounding differences on small invoices and minor adjustments on large ones.
From the implementation: The dual-tolerance design ($1 absolute + 2% relative) emerged from testing real invoices where small rounding differences on low-value invoices and shipping adjustments on high-value invoices each caused false exceptions with a single threshold.
When matching fails, invoices are categorized:
unmatched— no PO found or PO number missingvendor_mismatch— supplier on the invoice doesn’t match the PO vendorneeds_review— ambiguous match requiring human judgment
Invoice Exception Types (% of Exceptions) Unmatched PO (53%) Vendor Mismatch (28%) Needs Review (19%) Source: Typical distribution based on Ardent Partners exception benchmarks, 2025
Each exception type has a clear resolution path. That’s the point — you’re not just flagging problems, you’re categorizing them so the right person can fix them fast.
Three-way matching (adding goods receipt verification) is a natural extension. Nitish’s architecture separates matching logic from the rest of the pipeline, so you can add receipt matching without touching extraction or payments.
What Should You Measure from Day One?
AP automation delivers 200–600% ROI in year one for teams processing 1,000+ invoices monthly, with payback typically in 4–8 months (APQC, 2025). But you won’t know if you’re getting there without tracking the right numbers.
Four metrics matter:
- Field accuracy — what percentage of extracted fields are correct without manual editing?
- Auto-match rate — what percentage of invoices match a PO and vendor without human intervention?
- Exception rate — what percentage need manual review, and why?
- Manual touch time — how many minutes does a human spend per invoice?
Nitish’s dashboard already tracks these. It shows daily invoice counts, total amounts, auto-match percentages, and exception counts out of the box.
Here’s a concrete cost model. If your team processes 1,000 invoices per month at 12 minutes each with a blended cost of $35/hour, that’s roughly $7,000/month in handling effort. Organizations using IDP-based automation report per-invoice costs dropping to $1–$5 (Parseur, 2026). Even a 50% reduction in manual touch time saves $3,500/month.
Don’t trust vendor demos. Trust your own data.
How to Run a 14-Day Pilot

AI-driven IDP processes invoices in 1–2 seconds per document, compared to 10–30 minutes manually (Parseur, 2026). A two-week pilot on 200–500 real invoices is enough to validate whether the approach works for your operation.
Here’s the playbook:
- Select 200–500 historical invoices across your messiest vendors and formats. Don’t cherry-pick clean ones — you need to stress-test edge cases.
- Define your extraction schema and acceptance criteria before testing. What fields must be correct? What match rate is acceptable? Write it down.
- Run the pipeline and measure the four metrics. Nitish’s system instruments these automatically.
- Validate one complete path from email intake through PO matching to payment readiness. End-to-end, not just extraction.
- Group exceptions by cause. Fix the largest failure mode first, then re-run.
What does “good” look like? If you’re hitting 90%+ field accuracy and 40%+ auto-match rate on real data, you’re ahead of the industry average. If you’re below that, tune your schema and matching tolerances before scaling.
Many automation projects look great on 10 demo invoices and collapse on real data. That’s why you test with your documents, your vendors, your formats. Two weeks. Real invoices. Measured results.
Build or Buy? A Decision Framework
75% of AP departments now use some form of AI in their processes (Ardent Partners, 2025). The question isn’t whether to automate — it’s how.
Buy a suite when:
- Your invoice formats are standard and low-volume
- You want fixed workflows with minimal engineering
- Your ERP vendor offers a tightly integrated AP module
Build on infrastructure when:
- Your invoice formats are messy and varied
- Your matching logic is specific to your business (custom tolerances, multi-currency, line-item matching)
- Your compliance requirements demand full auditability of extraction and matching decisions
- Your integration landscape is non-trivial (multiple ERPs, custom approval chains)
The key distinction: IDP isn’t the whole AP product. It’s the document intelligence layer that makes unstructured intake dependable enough for everything after it to run automatically. Nitish’s architecture shows this cleanly — extraction, business rules, and system actions are separated so each can change independently.
AP Automation Adoption (2025) 68% still manually key invoices into ERP 32.6% touchless rate 75% use some form of AI Sources: Ardent Partners 2025, DocuClipper/HighRadius 2025
79% of companies experienced attempted or actual payment fraud in 2024, with 63% targeted by business email compromise (AFP, 2025). That’s another reason to own your pipeline — you control the validation, the audit trail, and the exception routing. When your payment logic is a black box inside a vendor’s suite, you’re trusting their security posture, not yours.
What This Comes Down To
- Don ’t replace your finance stack. Layer IDP for extraction, add matching rules, route exceptions, and connect to payments. Nitish’s open-source implementation proves the pattern works.
- Measure four things from day one: field accuracy, auto-match rate, exception rate, and manual touch time. If those numbers don’t improve, the automation isn’t delivering.
- Start with 200–500 real invoices over 14 days. Your own data will tell you more than any vendor demo.
The AP automation market is projected to reach $11.17 billion by 2030 (Mordor Intelligence, 2025). But market size doesn’t matter for your team. What matters is whether invoices move from inbox to payment with less manual work and more control. This architecture makes that possible without starting over. Run a metric-driven 14-day pilot. Decide based on what you measure, not what you’re told. For you developers out there, inspect the source code and test ADE on your invoice mix. Find Nitish’s repo here.
Frequently Asked Questions
What is intelligent document processing?
IDP uses AI to extract structured, validated data from unstructured documents like PDFs and scans. Unlike basic OCR, which outputs raw text, IDP understands field relationships and produces schema-ready JSON. Modern IDP can even achieve ~99% field-level accuracy vs. 85–95% for template-based OCR (Parseur, 2026).
Can I automate AP without replacing my ERP?
Yes. The approach Nitish built layers IDP and workflow automation on top of your existing stack. Your ERP stays the system of record — IDP handles unstructured intake, and the workflow layer manages matching, exceptions, and payment readiness. 68% of organizations still manually key invoices into ERPs that could accept structured data instead (DocuClipper, 2025).
How does purchase order matching work in practice?
Nitish’s implementation uses a dual-tolerance algorithm: invoices match POs when the amount difference is within $1 absolute or 2% relative. It validates currency and vendor, picks the best match when multiples exist, and assigns a confidence score. 14% of invoices still require exception handling even with automation (Ardent Partners, 2025).
How much does AP automation cost per invoice?
Best-in-class automated teams achieve $2.78 per invoice vs. $12.88 industry average (Ardent Partners, 2025). Building on open-source infrastructure like Nitish’s tmc-v1 eliminates per-seat licensing, making the economics even more favorable for high-volume operations.
