Over the past two years we’ve built an agent that learns each customer’s bookkeeping patterns and automatically codes transactions across multiple dimensions, so finance teams spend less time on repetitive accounting and close the month-end faster. We retrain a model per organisation using every accountant action — each confirmation, correction, and change in coding practice — so the agent improves over time and adapts as an organisation’s reporting evolves.
TL;DR
- Two years of development and our Coding Agent now serves a large share of our customer base and automatically codes nearly 2M fields every month.
- We’ve trained nearly 100,000 customer-specific models to date, and customers using the agent close their books ~4 days faster on average than those who don’t.
The short story
Bookkeeping is highly repetitive but never completely static. For nearly two years we’ve iterated on the Coding Agent: shipping small model improvements, watching how customers use it day-to-day, and expanding the system to handle the messy edge cases you only see in production.
The problem with manual coding and rules
Each organisation attribute spend across different dimensions. Some use two (e.g., GL account + cost centre); others allocate spend to projects, countries, product lines or entities.
Most accounting teams rely on two ways to code transactions:
- Manual coding. An accountant inspects each transaction (card, invoice, reimbursement) and selects the expense account plus whatever other dimensions they track: cost centre, cost carrier, project, country, product line, entity, or some other custom field. This is straightforward but repetitive. Even for relatively small organisations with a few hundred transactions per month, that still means thousands of fields to set and review every month.
- Rules and rule-override. Teams write rules (e.g., if supplier == X → expense account 40001). Rules automate common cases, but over time they proliferate: one rule for an edge case, another to override it, exceptions to exceptions. Rules are brittle and costly to maintain as organisations and reporting needs evolve.
Concrete example. Imagine an organisation that previously coded all food spend to 40000 Other. Later they add 40001 Food and Drinks. With rules, you have to create and maintain exceptions (and exceptions to exceptions).
The better way — flexible, contextual, and self-managing
Rules are important and have their place in accounting (and in our product). But they’re brittle: they don’t adapt when an organisation changes its chart of accounts, introduces a new cost centre or project, or refines how it allocates spend.
The Coding Agent learns patterns from each customer’s historical postings and can reason over multiple signals (supplier, spender, description, amount, and context). When users export or confirm postings in their accounting system, those actions become training events. The agent starts by suggesting the new coding, then applies it automatically once confidence is high — reducing rule churn and manual maintenance.
We support n-dimensional predictions: the model predicts each dimension together and adapts when customers add new ones. Because growing organisations often introduce new dimensions over time, this future-proofs the workflow.
What we built
- Per-organisation models and data control. We train on each customer’s historical postings so the model learns that organisation’s chart of accounts, coding conventions, and custom dimensions. Models are trained per organisation and do not leave Moss.
- Event-based training and human-in-the-loop. Customer confirmations are the primary training signal: every user input, correction, or exported transaction that includes our predicted values becomes a training event. The human-in-the-loop isn’t an afterthought; it’s the system’s heartbeat, and it’s why the agent improves materially over time.
- UI that makes provenance obvious. We indicate when a value is extracted from the document itself (e.g., VAT rate), when a value is set by a manual rule, and when it’s suggested by the Coding Agent. We use confidence thresholds: at high confidence we apply codings automatically; at lower confidence we surface the single most likely suggestion plus the next-most-likely options in the dropdown so users can correct quickly.
- Rule override behaviour. Over 75% of our customers allow the agent to overwrite rules they created — a sign that learned behaviour often outperforms brittle rule sets.
How it works
- Infrastructure: model artefacts are stored in GCP buckets; we use Vertex AI for training and Airflow for orchestration. The pipeline is repeatable, so it scales from the smallest customers to the largest and deals with the exponential growth in our customer base.
- Accuracy vs coverage. Probabilistic models force a trade-off between precision (how often an auto-applied coding is correct) and coverage (the share of dimension values we can confidently set). We improve both by enriching features and training data (e.g., spender metadata, richer receipt information) and by retraining when behaviour changes. As a rule of thumb, we calibrate thresholds so auto-applied codings target ≥95% precision, and for many customers and dimensions exceed 98% (measured against subsequent user confirmations and corrections; varies with data volume and coding consistency). Coverage is primarily driven by how much consistent historical coding an organisation has; after a few months, fewer than 25% of codings are typically manual, and with longer history that can drop below 10% (varies by customer and dimension set).
- Perceived vs measured accuracy: We’ve iterated on provenance and confidence cues, plus fast alternatives in the dropdown, because presentation has an outsized impact on whether the agent feels “right” in daily use. Under the hood, we measure accuracy from confirmations and corrections and use calibrated confidence thresholds to decide what we auto-apply versus what we show as a suggestion.
- Why we don’t use a global model: We explored the classic ML approach — one global model trained across all organisations. It didn’t work. Accounting conventions vary materially by company, market, and vertical; the same nominal code can mean different things, and the same supplier can map to different cost centres or projects. Per-organisation models let us learn each customer’s conventions without forcing a one-size-fits-all mapping.
Scale and impact
- Models trained: we have nearly 100,000 customer-specific models live.
- Predictions: our models make nearly 2M predictions every month.
- Model retraining frequency: on average, each organisation’s model is retrained ~8× per month.
- Time saved: customers using the Coding Agent close their books ~4 days faster on average.
- Behavioural signal: users are ~2× more likely to overwrite a coding that came from a manual rule they created than a coding suggested by the agent.
Key lessons and principles
- Per-organisation models win. Don’t force global norms on organisations with different reporting needs.
- Make human-in-the-loop the product’s heartbeat. Use confirmations as training data, not just UX.
- Train when behaviour changes. Event-based retraining is both more efficient and more responsive than a cron job.
- Prioritise explainability and control. Let customers disable rule overrides and see why the agent suggested a coding.












