From 40 Hours to 40 Minutes: AI Document Processing for Finance Teams

The Pain

A logistics company processed 800+ invoices per month across 120 vendors. Each invoice required:

Download the PDF from email
Manually key data into their ERP
Match against purchase orders
Flag discrepancies for review
Route for approval

Two full-time finance staff spent ~40 hours per week on this. Errors averaged 4.3% — mostly fat-finger typos during data entry. Each error cost 2-3 hours to reconcile.

"Our best people were doing data entry. It was soul-crushing and expensive."

The Solution

We built a document processing pipeline using Claude's vision capabilities, a Python orchestrator, and their existing ERP's API.

How It Works

Step 1: Ingestion

Invoices arrive via email. A mail rule forwards them to a processing inbox. The agent picks them up every 15 minutes.

# Simplified extraction flow
async def process_invoice(pdf_path: str) -> InvoiceData:
    # Extract text + table data using Claude vision
    response = await claude.messages.create(
        model="claude-sonnet-4-20250514",
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": pdf_to_base64(pdf_path)},
                {"type": "text", "text": EXTRACTION_PROMPT}
            ]
        }]
    )
    return parse_structured_output(response)

Step 2: Extraction

Claude reads each PDF visually (not OCR — actual vision) and extracts:

Field	Example
Vendor name	Maersk Line
Invoice number	INV-2026-0847
Date	2026-02-10
Line items	3x Container shipping, Rotterdam → NYC
Subtotal	$12,450.00
Tax	$0.00 (B2B exempt)
Total	$12,450.00
Payment terms	Net 30

Step 3: PO Matching

The agent queries the ERP for matching purchase orders using vendor name + amount range (±5%). When a match is found, it cross-references line items.

Three possible outcomes:

Exact match → auto-approve, queue for payment
Partial match → flag specific discrepancies for human review
No match → escalate to finance manager

Step 4: Anomaly Detection

Before routing for approval, the agent checks for:

Price increases > 10% vs. last invoice from same vendor
Duplicate invoice numbers
Unusual payment terms
Line items not on the original PO

⚠ ANOMALY DETECTED
Vendor: GlobalFreight Inc.
Issue: Unit price for "Pallet Storage" increased 23% vs. last invoice
Previous: $45/pallet/day → Current: $55.35/pallet/day
Action: Flagged for finance manager review

Accuracy

We ran the system in parallel with manual processing for 30 days:

Metric	Manual	AI Agent
Data extraction accuracy	95.7%	99.2%
PO match rate	91%	94%
Processing time per invoice	3.2 min	0.3 min
False positive anomalies	N/A	2.1%

The 0.8% of extraction errors were edge cases: handwritten notes on invoices, unusual PDF layouts, and scanned documents at extreme angles. We added a confidence score — anything below 95% gets human review.

The Stack

Incoming Email
  └── Python Orchestrator (runs on AWS Lambda)
        ├── Claude API (vision extraction)
        ├── ERP API (PO lookup + data entry)
        ├── Anomaly rules engine
        └── Slack notifications
              ├── #finance-approvals (matched invoices)
              └── #finance-review (anomalies + low confidence)

Total infrastructure cost: ~$180/month (Claude API + Lambda compute).

Impact

Before	After
40 hrs/week on invoice processing	3 hrs/week (review + exceptions)
2 FTEs dedicated to data entry	0.2 FTE (oversight only)
4.3% error rate	0.8% error rate
$0 — no anomaly detection	$22K saved in Year 1 from caught overcharges

The two finance staff now spend their time on analysis, vendor negotiations, and cash flow forecasting — work that actually moves the business forward.

Why This Works

Three principles made this project succeed:

Vision > OCR — Claude reads PDFs like a human reads them. No brittle template matching. New vendor? New format? It just works.
Human-in-the-loop — the agent doesn't make payment decisions. It prepares everything perfectly so humans make fast, confident decisions.
Confidence scoring — the system knows when it's unsure. That honesty is what makes it trustworthy.