Where AI actually fits, and where it doesn’t (yet).

A practical lens for distinguishing the AI use cases that compound from the ones that quietly burn cycles. Drawn from work across regulated industries.

Practice notesJanuary 20268 min read

Every organization we work with is being asked the same question by their leadership: “What’s our AI strategy?” It’s the wrong question. A strategy is what you do; the right question is which problems in your business deserve AI, and which don’t, not yet, possibly not ever.

We’ve developed a simple lens for sorting the candidates. It isn’t the only lens, but it’s the one that has held up across sectors as different as investment research, public-sector procurement and healthcare operations.

The four conditions

An AI use case is likely to pay off when all four of these conditions are true. The more conditions are missing, the more carefully you should scope.

1. The work is repetitive but not trivial.

AI doesn’t help with work that’s already a one-line SQL query. And it can’t reliably do the kind of work that requires nuanced human judgment as the primary input. The sweet spot is the middle: tasks humans currently do many times a day that involve reading, summarizing, classifying, drafting or extracting. Document triage, response drafting, policy lookup, these are the boring use cases that compound.

2. There is a tolerable error mode.

Every model is wrong some of the time. The question is whether being wrong costs you something you can’t recover from. If a misclassified invoice ends up on a human reviewer’s queue, that’s a tolerable error. If a misclassified medical record ends up in a billing system without review, that isn’t. Design around the error mode first; the accuracy number is downstream.

3. You can measure improvement.

If you can’t state, in advance, the metric the system is supposed to move, and how you’ll measure it, you’re likely chasing a theme, not a use case. The metric doesn’t have to be perfect, but it has to be specific enough that the team can argue about it.

If you can’t name the metric the system is supposed to move, you’re shopping for AI; you’re not yet building with it.

4. There is a human owner for what the system does.

Every production AI system needs an owner, someone whose job changes because of it, who answers when it’s wrong, and who has the authority to evolve it. Without that owner, even the best system drifts into disuse.

What this rules in

Workflow assistants embedded in your team’s existing tools. Document-understanding pipelines feeding analytics. Retrieval over your knowledge base. Triage systems with humans in the loop. The unglamorous wins that quietly add up over a year.

What this rules out (for now)

Anything that depends on novel agentic reasoning without supervision in a domain where mistakes are expensive. Anything that requires you to surrender audit trail and decision provenance in a regulated context. Anything sold primarily as a vision rather than as a workflow.

None of this means “don’t use AI.” It means be honest about what the technology is good at right now, choose the use cases that match, and let the rest mature.

A short tour of cases we’ve seen pay off

The four conditions above are abstract; the patterns we keep deploying are not. A small gallery of shapes that consistently earn their keep inside the kinds of organizations we work with.

Inbox triage. A shared inbox, support, claims, constituent services, procurement, receives unstructured messages in volume. A classifier routes each message to the right queue and attaches a brief summary, with a confidence score and a human reviewer on the path. Throughput improves, response time drops, and the disagreements with reviewers become training data for the next iteration.

Knowledge retrieval with citations. Internal staff need to answer questions whose answers are buried in a policy library, a handbook or a contract collection. A retrieval system surfaces the three most relevant excerpts with citations, and the staff member writes the response. The system does the searching; the human still does the writing.

Document extraction with confidence. Invoices, forms, scanned contracts, intake records, structured information trapped in unstructured documents. An extraction pipeline returns structured fields with confidence scores. High-confidence rows flow downstream; low-confidence rows hit a human review queue. Within a quarter, the queue shrinks as the pipeline learns from corrections.

Drafting assists embedded in the workflow.A first-draft response, summary or memo generated inside the tool the user already works in, their CRM, ticketing, EHR, content system. The user edits and ships. The point isn’t writing for them; it’s cutting the first-blank-page friction by 80%.

Failure modes we keep seeing

For balance, the patterns that don’t pay off, and the warning signs that suggest a use case is on that trajectory.

The bolt-on chatbot.A chatbot grafted onto a corporate site or internal portal, with no clear job and no integration into actual workflows. Usage spikes in week one and approaches zero by week four. The pattern fails because there’s no decision the chatbot is supposed to inform, the same problem as a vanity dashboard.

The unbounded agent.An autonomous “agent” asked to do open-ended work without supervision, in a domain where mistakes are expensive. These programs make great demos and bad production systems. The fix is almost always to constrain the agent to a narrow workflow with checkpoints, not to make it “smarter.”

The accuracy theater.A model launched to great fanfare with a single-number accuracy claim, evaluated against a dataset that doesn’t resemble production traffic. Three months in, real-world performance has slid 20 points and nobody is sure when it happened. Evaluation harnesses that include drift, failure modes and cost are non-negotiable for anything in production.

How to scope a first AI project

For organizations early in the journey, the most important decision is what to do first. We tend to argue for the smallest project that has all four conditions satisfied, not the most ambitious one. A boring success teaches the organization more than a glamorous failure, and it buys the political capital to attempt the harder thing next.

Concretely: a six-to-eight week Discovery sprint targeting a single workflow with a clear owner, a measurable metric, and a tolerable error mode. End-of-sprint, you have either a working prototype on real data or an honest assessment that the conditions weren’t there. Either outcome is more valuable than nine more months of strategy decks.

What changes in eighteen months

The four conditions are stable; the boundary of what they admit moves every quarter. Tasks that don’t pass the “repetitive but not trivial” bar today often will in a year, as models improve and tooling matures. The right cadence isn’t to wait for the technology to settle, it won’t, but to revisit your list of disqualified use cases on a regular schedule and ask whether something has changed.

The organizations that compound value with AI aren’t the ones making the biggest bets. They’re the ones with a working filter for what to attempt now, a working harness for measuring whether it worked, and the discipline to keep both.