Agentic AI in banking: where SLMs fit

Every major consultancy is calling 2026 the year of agentic AI in financial services. McKinsey, Accenture, KPMG, they all agree that autonomous AI agents executing workflows will reshape banking operations. What none of them are saying clearly enough is this: agentic AI systems built on a single massive language model are the wrong architecture for regulated financial institutions.

The better approach is a modular one. Multiple small language models (SLMs), each fine-tuned for a specific task, orchestrated together into an agentic system. This article explains why, and what it means for banks evaluating their AI architecture.

What is agentic AI and why does banking care?

Agentic AI refers to AI systems that can work autonomously to achieve specific goals. Unlike a chatbot that responds to a single prompt and waits, an agentic system can plan multi-step tasks, use tools, make decisions, and execute workflows with minimal human intervention.

In banking, the use cases are already emerging. A compliance agent that monitors transactions, identifies suspicious patterns, generates investigation reports, and escalates high-risk cases, all without a human initiating each step. An onboarding agent that verifies identity documents, runs KYC checks, assesses risk, and triggers account creation. A customer advisory agent that evaluates a client's financial situation, identifies relevant products, checks eligibility, and prepares a personalized recommendation.

The economic case is significant. According to KPMG, global spending on agentic AI reached an estimated $50 billion in 2025. McKinsey estimates AI could reduce certain banking cost categories by as much as 70%, with a net effect of 15-20% across the industry. Banks that move first are expected to gain a 4% return on tangible equity advantage over those that wait.

Why does agentic AI need multiple specialized models?

The default assumption in most enterprise AI discussions is that agentic systems run on a single large language model, one powerful model that handles everything from document analysis to decision-making to customer communication. For general-purpose applications, this can work. For regulated financial services, it creates problems.

One model cannot be equally good at everything

A general-purpose LLM with hundreds of billions of parameters carries broad knowledge but shallow depth in any specific domain. It knows something about AML regulations, something about credit risk, something about insurance underwriting but it doesn't know any of these domains with the precision that a financial institution requires.

A 2-billion parameter model fine-tuned specifically on AML transaction data will outperform a 200-billion parameter general model on AML-related tasks. The same is true for document classification, regulatory interpretation, and domain-specific language processing. Specialization beats scale when the task is narrow and accuracy matters.

Monolithic models create single points of failure

In an agentic system, if one massive model handles every task and that model hallucinates, misinterprets a regulation, or produces an incorrect risk assessment, the error propagates through the entire workflow. There is no second opinion, no cross-check, no compartmentalization.

A modular architecture with specialized SLMs isolates failures. If the document classification model makes an error, the compliance screening model can still function correctly. Each model operates within its domain of competence, and the orchestration layer can implement validation checks between steps.

Regulatory auditability requires task-level traceability

Under both DORA and the EU AI Act, financial institutions must demonstrate traceability for AI-driven decisions. When a single model generates a complex output, combining document analysis, risk scoring, and recommendation generation — it is difficult to audit which part of the model's reasoning drove which part of the decision.

With multiple SLMs, each model's input and output is discrete and auditable. The document extraction model produced this output. The risk scoring model used that output to generate this score. The recommendation model used the score and the client profile to produce this suggestion. Every step is traceable, explainable, and independently testable.

What does a modular agentic architecture look like?

A practical agentic AI system for banking uses multiple SLMs orchestrated by a coordination layer. Each model is a specialist, and the orchestrator manages the workflow.

The component models

A typical deployment might include a document processing SLM trained on financial document formats (loan agreements, KYC documentation, regulatory filings) that extracts structured data from unstructured inputs. A compliance screening SLM fine-tuned on transaction patterns and investigation outcomes that evaluates risk signals and scores alerts. A regulatory interpretation SLM trained on DORA, the EU AI Act, AML directives, and internal policies that provides regulatory context for decisions. A communication SLM fine-tuned on the institution's tone, templates, and customer interaction patterns that generates client-facing outputs.

Each model is small (1-2 billion parameters) and runs on modest infrastructure. A single GPU can serve one or more of these models simultaneously, depending on throughput requirements.

The orchestration layer

The orchestration layer is the "agent" in agentic AI. It receives a task, breaks it into steps, routes each step to the appropriate specialized model, collects outputs, and assembles the final result. It also implements validation rules: if the compliance model flags a transaction but the confidence score is below a threshold, escalate to a human reviewer rather than proceeding automatically.

This orchestration can be built using standard frameworks (LangChain, CrewAI, or custom logic) and does not itself require a large language model. It is a workflow engine that coordinates specialists, not a generalist trying to do everything.

The RAG layer

Alongside the specialized models, a retrieval-augmented generation (RAG) pipeline supplies current information that changes frequently such as regulatory updates, internal policy documents, customer account data. The fine-tuned models provide domain reasoning; the RAG layer provides current facts. Together, they deliver responses that are both domain-aware and factually grounded.

Why does this architecture suit on-premise deployment?

One of the key advantages of building agentic systems from small, specialized models rather than one massive model is infrastructure efficiency.

Lower compute requirements

A single general-purpose LLM with 100 billion or more parameters requires multiple high-end GPUs and significant memory. A collection of four purpose-built 2-billion parameter SLMs can run on a fraction of that infrastructure, potentially a single server with one or two GPUs. For financial institutions deploying on-premise to meet DORA and EU AI Act requirements, this dramatically reduces the hardware investment.

Independent updates and retraining

When regulations change or new transaction patterns emerge, you don't need to retrain the entire system. You retrain the specific model affected — the compliance SLM when AML rules change, the regulatory interpretation SLM when new guidance is published, the communication SLM when the institution updates its customer interaction standards. This targeted retraining is faster, cheaper, and less risky than updating a monolithic model.

Predictable costs

Each SLM operates on fixed infrastructure with predictable throughput. There are no per-token API charges that scale unpredictably with usage. The institution knows exactly what the agentic system costs to operate each month, regardless of transaction volume or query complexity.

What should banks consider before building agentic AI?

Agentic AI is powerful, but it is not a plug-and-play solution. Financial institutions evaluating this approach should consider several factors.

Start with one workflow, not the entire bank

The most successful agentic AI deployments start with a single, well-defined workflow (AML alert triage, customer onboarding, or loan application processing) and expand from there. Attempting to build an enterprise-wide agentic system from day one is the fastest route to what McKinsey calls "pilot purgatory."

Human-in-the-loop is not optional

In regulated financial services, autonomous AI systems must have human oversight at defined decision points. The agentic system can prepare, analyze, and recommend, but a human must authorize high-impact decisions like suspicious activity reports, credit approvals, or regulatory submissions. This is both a regulatory requirement and a practical safeguard.

Governance must be built in from the start

Every agent action needs an audit trail. Every model decision needs an explanation. Every workflow needs defined escalation paths. Building governance into the agentic architecture from the beginning is far easier than retrofitting it after deployment, and it is what regulators expect under the EU AI Act's requirements for high-risk AI systems.

Key takeaways

Agentic AI is coming to banking in 2026, but the right architecture matters as much as the technology itself. A modular system built from multiple specialized small language models delivers higher accuracy, better auditability, lower infrastructure costs, and more resilient operations than a monolithic approach using a single large model.

For financial institutions subject to DORA and the EU AI Act, this modular architecture also provides the task-level traceability, data sovereignty, and independent testability that regulators require.

The question is not whether your bank will use agentic AI. It is whether you will build it on a foundation you own and control — or outsource it to a single vendor and a single model you cannot audit.

If your institution is exploring agentic AI architecture, we can help you design the right approach.

Related reading:

‍

SLMs as building blocks for agentic AI in banking