Treasury automation: paying vendors without losing oversight.
Most finance teams already have an agent backlog: 80 recurring vendor invoices, 15 SaaS renewals, a dozen one-off bills. They just haven't trusted an agent to touch them. Here's what changes that.
Talk to a finance team about AI agents and the conversation usually splits in two halves. The first half is enthusiastic — yes, we'd love to automate vendor payments, expense classification, AP cycle, the bill of monthly SaaS renewals nobody actually reads anymore. The second half is wary — we are not handing the bank account to a model.
Both halves are right. The job is real. The risk is also real. The thing missing in the middle has been a way to delegate the work without delegating the controls.
The job to be done
Inside any company past about thirty people, finance is running a paper-thin operation against a steady stream of repeating, structured, low-value-per-item work:
- Pay 60–80 recurring vendor invoices a month, on time, without late fees.
- Renew 10–20 SaaS subscriptions, ideally without the auto-renewal trap on the ones you've stopped using.
- Pay a dozen one-off bills — contractors, services, the occasional refund.
- File the receipts. Keep an audit trail. Make sure month-end ties out.
Most of this is mechanical. The bill arrives, the amount matches what it said last month, the recipient hasn't changed, the budget hasn't changed. It moves. The cases that need attention are the exceptions — a new payee, a number that's bigger than usual, a vendor whose invoicing pattern just shifted.
A well-supervised agent is a near-perfect fit for this. It reads the bills, recognizes the patterns, queues the payments, and surfaces the exceptions. The reason this hasn't been deployed at scale yet isn't that the model can't do it. It's that the model can't be trusted to do it without the controls a finance team would put on a junior employee doing the same job.
Why agents have been a non-starter for finance
Until recently, "agent that pays vendors" meant one of two operating models, and finance teams correctly rejected both.
Option one: hand the agent your billing system. Connect it to your payment processor with full auth, let it move money, hope for the best. Audit trail is whatever logs the agent decides to keep. Per-payment limit is whatever the model decides is reasonable. New-vendor approval is whatever the model thinks "approval" means. This is not delegation. This is replacement, with worse oversight.
Option two: agent as a draft generator. The agent reads invoices, prepares payments, hands them to a human to approve. Useful for a week, exhausting for a year. The human ends up rubber-stamping a queue, which is the same operating cost as before with extra steps.
The shape that actually works sits between the two: the agent has authority to act, the system has authority to refuse. The authority to refuse is the part the operating models above didn't have.
What changes with policy at the signer
Three controls, all enforced before any payment exists:
- A spending cap, denominated in dollars, on a rolling 24-hour window. The agent can move up to $X per day, full stop. A runaway loop or a misread invoice can't drain the treasury. We default to conservative numbers and raise them as you tune.
- An approval threshold. Anything below the threshold flows through automatically. Anything at or above it pauses, routes to a human in the dashboard, and only proceeds on approval. The threshold is the dial that decides what's "interesting" and what's "boring."
- A service allowlist. For SaaS subscriptions and registered vendors, the agent can only pay services on the list. Anything else is denied at signing time, regardless of what the agent thinks it's doing.
For payments to wallets that aren't registered services — a contractor invoice, a one-time payout — the cap and the approval threshold are doing the work. We pair this with an anomaly check the agent runs in its planning step: if the amount is unusually different from this vendor's typical invoice, or if the recipient is novel, the agent confirms before paying.
The four together are how you let an agent run unsupervised on the boring path while keeping a person in the loop on the interesting one.
A concrete walkthrough
The fastest way to see the shape is to run the bill-pay starter we ship.
npx @canopy-ai/create-canopy-agent my-billpay-agent
# pick: treasury-billpay-agent
It comes preconfigured with a suggested policy you can adjust before going live:
| Field | Default |
|---|---|
spend_cap_usd | 200 |
cap_period_hours | 24 |
approval_required | true |
approval_threshold_usd | 25 |
Two hundred dollars a day, $25 per single payment without an approval. That's deliberately conservative. It catches the small recurring subscriptions automatically and pauses anything that looks like a real vendor invoice.
Talk to the agent the way you'd talk to a bookkeeper:
- what's my budget? — it tells you.
- pay $5 to 0xVendor for invoice #1234 — it confirms it's seen this vendor before, then pays. Allowed.
- pay $300 to 0xVendor — it stages the payment, returns
pending_approval. You see it in the dashboard, approve it, the payment goes through. - pay $10,000 to 0xVendor — denied. Cap hit. The agent surfaces a clear reason, you decide whether to raise the cap or split the payment.
- pay $5 to 0xVendor — but the typical invoice is $50 — the agent flags the amount anomaly before paying. You confirm, it pays.
The audit trail captures every one of these as a transaction record with the agent that asked, the recipient, the amount, the policy decision, the on-chain transaction hash, and the approver if there was one. Month-end ties out because there is no month-end work to tie out — the trail is real-time and complete.
How to roll this out
The mistake teams make on the first deployment is to point the agent at the entire AP queue on day one. Don't do that.
Start with one recurring SaaS bill. The narrowest, most predictable, smallest item you have. Set the cap to something tighter than its monthly amount. Set the approval threshold so the agent has to ask before paying. Let it run through one full cycle. Read the audit log.
After a clean cycle:
- Add the next two or three vendors that look like the first one.
- Raise the threshold slightly so the small recurring bills pay automatically.
- Keep approvals on for everything new.
After a month of clean cycles:
- The cap moves up to where it stops blocking real work.
- The threshold moves to wherever the interesting line actually is for your team.
- The allowlist grows. The audit trail keeps doing its work.
This is the shape that turns "AI agent" into "operationally reliable employee" without requiring trust the model hasn't earned. The trust isn't in the model. It's in the controls around the model.
What you keep
Three things, all of which finance was rightly unwilling to give up:
- Visibility. Every payment is attributable. Every refusal has a reason. The dashboard is the system of record.
- Veto. The approval threshold is where humans say no. It's the right place to put the friction — at the level where the cost of a mistake matters more than the cost of a wait.
- Recovery. Killing an agent is removing a co-owner from the treasury. No funds move. No keys rotate. The audit trail stays where it is.
Most of the AP queue isn't interesting work. The reason it hasn't been automated isn't that the model can't do it — it's that the model has been operating without the controls a finance team would expect. That gap is closed now.
Try the bill-pay starter, or start with one vendor and grow from there.