All Field Notes
FINANCE · 12 MIN READ

Invoice automation: PDF → Xero with n8n and AI OCR

TL;DR · AP AUTOMATION

A reliable invoice workflow is inbox intake → PDF extraction → OCR → validation → Xero draft bill → approval. AI can get you to 98-99% practical accuracy on clean supplier PDFs, but only if you add deterministic checks for supplier, GST, totals, currency, duplicates, and approval limits.

Invoice automation fails when teams treat OCR as the product. OCR is one step. The product is a controlled accounts payable process that turns supplier documents into draft bills with evidence attached and humans reviewing the exceptions.

The workflow

  1. Watch a dedicated AP inbox for supplier emails.
  2. Extract PDF attachments and reject image-only spam or unsupported formats.
  3. Hash the file and check for duplicates before spending OCR budget.
  4. Run OCR and return structured JSON: supplier, ABN, invoice number, issue date, due date, line items, totals, tax, currency, and bank details.
  5. Validate totals, tax, supplier match, purchase order, and approval policy.
  6. Create a draft bill in Xero with the source PDF attached.
  7. Route exceptions to Slack, email, or the finance queue.

OCR choice

EngineBest forRisk
Google Document AIHigh-volume structured invoice extractionNeeds processor tuning
GPT-4 Vision style modelMessy invoices and reasoning over notesNeeds strict JSON validation
HybridStable extraction plus exception reasoningMore moving parts

The hybrid pattern is best for most SMBs above a few hundred invoices per month. Use Document AI for baseline extraction. Send low-confidence or inconsistent records to a vision model with the PDF image and validation errors. Never let the model post straight to Xero without checks.

Validation rules that matter

  • Total must equal subtotal plus tax within a small rounding tolerance.
  • Supplier ABN or tax ID must match the Xero contact or approved vendor table.
  • Invoice number plus supplier must be unique.
  • Currency must match supplier policy or purchase order.
  • Bank details changes must require human approval.
  • Invoices above the approval threshold must stay draft.
CONTROL

Bank detail changes are the highest-risk field. Treat them as fraud indicators, not data entry. Store the extracted value, alert finance, and require manual verification before payment.

Posting to Xero

n8n can create the Xero bill as draft, set contact, dates, line items, account codes, tax types, tracking categories, and attach the original PDF. Use draft status by default. Approved status should require an explicit finance rule, not just OCR confidence.

Cost versus manual entry

Manual invoice entry usually costs 3-8 minutes per invoice once email sorting, lookups, and correction are counted. At 500 invoices per month, that is 25-67 hours of finance time. A workflow with OCR, validation, and Xero posting often pays back before the second month when exception rates stay under 15%.

The four edge cases that break naive workflows

Every invoice automation we audit hits the same four edge cases in the first 30 days. Anticipate them once and you avoid the long tail of small finance fires later.

1 · Multi-page invoices with continuation totals

Some suppliers (telcos, freight forwarders, software vendors) bill on page 3 of 12, with the first two pages being a portfolio statement. OCR a multi-page PDF as a single image and the model averages everything into a hallucinated total. Fix: page-classify before extraction. Use a cheap model pass to label each page as cover, line-items, summary, or terms, and only feed the labelled pages into the structured-extraction step. This single change lifted accuracy from 87% to 98% on a freight-forwarding client's bills.

2 · GST registrations that change mid-year

When a supplier crosses the GST threshold and re-invoices a previously zero-GST charge, naive workflows either reject the invoice (because the ABN-GST mapping is stale) or post it with the wrong tax treatment. Fix: refresh the supplier's GST status from the ATO ABN Lookup API on every invoice, not just on supplier creation. The lookup is free and adds 180ms.

3 · Foreign-currency invoices with implicit conversion

A USD invoice from a US vendor with no explicit AUD conversion is the most common silent failure mode. n8n's Xero node posts the invoice in the document currency by default; your finance team finds a duplicate AUD-denominated bill that doesn't reconcile against the bank statement. Fix: detect the document currency at the extraction step and route non-base-currency invoices to a separate workflow that pulls the day's RBA reference rate and stamps the conversion explicitly into Xero's tracking categories.

4 · Statement-not-an-invoice

Suppliers email monthly statements that look like invoices. They have totals, supplier details, dates — but they are summaries, not billable documents. Posting them creates duplicates. Fix: a small classifier (regex on the subject line and the document title field is usually enough) that routes anything tagged STATEMENT, SUMMARY, or ACCOUNT SUMMARY to a separate folder for human review, never to Xero.

Real numbers from one production build

A NexFlow Spark engagement we shipped in Feb 2026 for a 14-person construction firm processing roughly 320 supplier invoices per month. Numbers, not vibes:

MetricBefore (manual)After (n8n + Document AI + Xero)
Median time per invoice6.2 min38 sec (auto) · 4.1 min (exception)
Field-level accuracy96.4% (human keying)98.7% (auto) · effectively 100% on exceptions
Exception raten/a11.8% routed to finance for review
Monthly finance hours on AP33 hrs7 hrs
Average days from receipt to draft bill2.4 days14 minutes
Cost per invoice (OCR + LLM)A$0.041 (Document AI) + A$0.012 (GPT-4o-mini for exceptions)

26 hours per month back to the finance team. At an A$45/hr fully-loaded cost, the workflow paid back its A$2,400 build inside week 3 and now costs the firm about A$17/month to run. The most-cited qualitative benefit was different: month-end close moved from a 3-day scramble to half a day, because draft bills were already in Xero the day they arrived.

WHAT WE GOT WRONG IN WEEK 1

The first version of this workflow posted bills as approved, not draft, on any OCR confidence above 0.92. Finance hated it — they lost the muscle memory of the quick sanity check. We reverted to draft-only by week 2 and never went back. The cost of "approve on confidence" is loss of finance review; the cost of "draft only" is two clicks per bill. Two clicks wins.

KEY TAKEAWAYS
  • OCR is one step, not the whole AP process.
  • Use draft bills in Xero unless approval rules are explicit.
  • Expect 98-99% practical accuracy only with validation and exceptions.
  • Hash PDFs and check duplicates before spending OCR budget.
  • Bank detail changes require manual verification.

Frequently asked questions

How accurate is AI invoice OCR?

Clean supplier PDFs can reach 98-99% practical accuracy. Messy scans, handwritten notes, and supplier changes need exception review.

Can n8n post bills to Xero?

Yes. Use the Xero node or API to create draft bills and attach the source document.

How much does invoice automation save?

Most SMBs save 3-8 minutes per invoice plus correction time, with bigger gains at month-end.

What OCR works best for invoices in 2026?

Document AI is strong for scale; vision LLMs are strong for messy cases. Hybrid wins when accuracy matters.

Want AP out of the inbox?

Book a map and we will scope the inbox, OCR, Xero, and approval controls before building.

Sources and method

  1. Patterns are based on NexFlow AP automation builds and finance workflow audits.
  2. Accuracy ranges assume supplier PDF invoices with validation rules and human exception review.
  3. Xero posting details should be validated against current Xero API and tax configuration before deployment.