Why banks are moving from LLMs to SLMs - and when they shouldn't

Banks and fintechs are moving AI workloads from general LLMs to purpose-built SLMs. Learn why the shift is happening, when it makes sense, and when it doesn't.

Blog Collection Athour img
Michael Forystek
Co-founder, Growth & Partnerships
shape

Why banks are moving from LLMs to SLMs - and when they shouldn't

For the past two years, most financial institutions that adopted AI started in the same place: a general-purpose large language model accessed through an API. ChatGPT, Claude, Gemini - the biggest, most capable models available, delivered as a service. It made sense. These models were the easiest to try, the fastest to integrate, and they worked reasonably well for a wide range of tasks.

But a quieter shift has been happening underneath the headline news about ever-larger frontier models. Banks, fintechs, and insurers are increasingly moving production workloads away from general-purpose LLMs and toward small language models, SLMs, that run on their own infrastructure and are fine-tuned for specific financial tasks.

This article explains why the shift is happening, when it makes sense, and importantly, when it doesn't. Because for some use cases, a large general-purpose model remains the right tool. The goal is not to pick a side. It is to match the tool to the job.

What is the difference between an LLM and an SLM?

A large language model (LLM) is a general-purpose AI model trained on broad web-scale data. These models typically range from tens of billions to over a trillion parameters and are designed to handle almost any topic - from writing code to summarising Shakespeare to explaining quantum physics. ChatGPT, Claude, and Gemini are LLMs.

A small language model (SLM) is a more compact AI model, typically with up to 10 billion parameters, that trades breadth of knowledge for depth in a specific domain. SLMs are designed to run efficiently on modest infrastructure, often a single GPU, while delivering high accuracy on the narrow range of tasks they were built for.

The distinction is not just about size. It is about purpose. An LLM tries to know everything; an SLM is fine-tuned to know one thing very well. For a financial institution asking "will this model correctly flag suspicious AML patterns?" the answer depends less on how many parameters the model has and more on what data it was trained on.

Why are banks moving from LLMs to SLMs?

Four forces are driving the shift, and they compound each other.

Accuracy on domain-specific tasks

General-purpose LLMs are built to handle everything, which means they are optimised for nothing in particular. On narrow financial services tasks: AML alert triage, regulatory citation lookup, product eligibility assessment, a purpose-built SLM fine-tuned on institution-specific data typically reaches 90-95% accuracy compared to 60-75% for a generic LLM applied to the same task.

The difference matters most at the operational edge. A 75% accurate system for AML screening means one in four decisions needs human review — which is not the efficiency gain the institution was looking for. A 95% accurate system changes the workflow from "verify everything" to "verify exceptions," which is what production use actually requires.

Data sovereignty and regulatory obligations

Financial institutions handle data that cannot leave the organisation. Transaction records, customer information, internal compliance assessments, investigation files - sending any of this to a third-party API creates regulatory exposure. Under DORA, every API provider becomes a third-party ICT dependency that must be continuously monitored, contractually governed, and audited. For some institutions, this alone makes API-based LLMs unworkable for production use cases.

SLMs running on-premise or in private cloud environments keep data fully within the institution's control. There is no external transmission, no third-party processing, no contractual negotiation with a foreign vendor over audit rights. For compliance-sensitive workloads, this is not a nice-to-have, it is a prerequisite.

Cost at production scale

API-based LLM pricing looks cheap at pilot scale. At production scale, where a mid-size bank might run hundreds of thousands of compliance screenings or customer interactions per day, the monthly bill can reach tens or hundreds of thousands of dollars. And the cost scales linearly with usage, so every transaction spike becomes a budget spike.

A purpose-built SLM running on a single GPU handles similar workloads with different economics. The infrastructure cost is fixed. After the upfront investment, the marginal cost per query approaches zero. For institutions with high-volume, continuous workloads, the cost difference is not marginal.

Latency and throughput

API calls add 200-1,000 milliseconds of network round-trip latency per query. For real-time applications: transaction screening, fraud detection, customer-facing advisory, this latency compounds across every interaction. An SLM running on local infrastructure responds in a fraction of the time, which changes what is operationally possible.

For a fraud detection system that needs to make a call-or-approve decision while the customer is still on the page, the difference between a 50-millisecond local response and a 1-second API round-trip is the difference between a working system and one that customers abandon.

When should a financial institution stick with an LLM?

Not every AI use case in financial services benefits from moving to an SLM. Some workloads are better served by a large general-purpose model, and recognising which ones matters as much as recognising the opposite.

Broad, exploratory tasks

If the task requires general knowledge spanning many topics: market research, document summarisation across diverse sources, creative copy generation, open-ended brainstorming, an LLM's breadth is the advantage. A narrow SLM trained only on your compliance data will not be the right tool for drafting a press release or analysing a macroeconomic trend.

Low-volume, intermittent workloads

If the AI use case generates a few dozen queries per day rather than thousands, the cost math for a dedicated SLM rarely works. The fixed infrastructure cost, the fine-tuning effort, and the ongoing maintenance overhead are all justified by volume. At low volumes, pay-per-token API pricing is cheaper and simpler.

Rapidly changing requirements

If the use case is still being explored, if the institution is not yet sure what it wants the model to do, an LLM accessed through an API provides flexibility that a fine-tuned SLM does not. SLMs are optimised for a known task. Fine-tuning one before the requirements are stable means fine-tuning it multiple times, which is expensive and slow. LLMs are the right tool for the "we're still figuring this out" phase.

Tasks that genuinely need world knowledge

Some financial services use cases benefit from the general knowledge an LLM carries. An AI-assisted research tool for a wealth management team asking questions about geopolitics, commodity markets, and historical precedent is going to get better answers from a frontier LLM than from any SLM, because the knowledge required simply is not in the narrow training data of a specialised model.

The hybrid approach that actually works

In practice, the most sophisticated financial institutions are not choosing between LLMs and SLMs. They are using both, assigned to the tasks each does best.

A typical architecture might look like this. An internal knowledge assistant for operations and compliance staff runs on an on-premise SLM fine-tuned on institutional policies - handling thousands of queries per day at high accuracy and zero per-query cost. A research tool for the investment team uses an API-based LLM for broad exploratory questions where general knowledge matters. A transaction screening system uses a purpose-built SLM for the high-volume, latency-sensitive compliance checks, and escalates edge cases to a human reviewer or, in some cases, to an LLM for secondary assessment.

This architecture is not about ideology. It is about matching each workload to the tool that fits it. The institutions getting the best results from AI are the ones that have stopped asking "which model should we use?" and started asking "which model should we use for this specific task?"

How should you evaluate your current setup?

The decision framework is straightforward. For each AI use case currently in production or planning, ask four questions:

How narrow is the task? If it is a well-defined task with clear inputs and outputs, an SLM can be fine-tuned to outperform a general LLM. If it is open-ended or requires broad general knowledge, the LLM is the better fit.

How high is the volume? High-volume continuous workloads favour fixed-cost infrastructure. Low-volume intermittent workloads favour pay-per-use APIs.

How sensitive is the data? If the data cannot leave your infrastructure for regulatory or operational reasons, an on-premise SLM is the only viable option.

How stable is the requirement? Stable, well-understood tasks justify the investment in fine-tuning. Exploratory tasks do not.

The answers will rarely point uniformly in one direction. Most institutions will find that some use cases belong on an SLM, some belong on an LLM, and the right architecture is a mix - chosen deliberately rather than by default.

Key takeaways

The move from LLMs to SLMs in financial services is real and accelerating, but it is not universal. SLMs are winning the high-volume, compliance-sensitive, latency-critical, domain-specific workloads - which happens to cover most of the production use cases that matter in banking and insurance. LLMs remain the right tool for exploratory work, broad knowledge tasks, and early-stage experiments where flexibility matters more than specialisation.

The honest answer to "SLM or LLM?" is "it depends on the task." The institutions getting the most value from AI are the ones that have built architectures flexible enough to use both where each fits best.

If you want to understand which of your AI workloads would benefit from moving to an SLM — and which should stay where they are — we can help you work through the analysis.

Related reading:

Ready to Own Your AI?

Stop renting generic models. Start building specialized AI that runs on your infrastructure, knows your business, and stays under your control.