Open-weight vs open-source AI for financial services

The word "open" is doing too much work in AI vendor pitches. A model can be open-weight, open-source, source-available, open-licensed, or freely-downloadable-but-licensed — and these are not the same thing. For a financial institution that has to document how its AI is built, what it can do with that AI, and what it owes its regulator, the precise meaning of "open" determines what is audit-defensible, what is contractually safe, and what the actual vendor-dependency looks like.

The shorthand most teams default to — "we're using open-source AI" — is almost always wrong on the technical merits. The model is open-weight, and the distinction matters less in a research lab than it does in a procurement-and-compliance review. What follows defines both terms precisely, lays out the five tiers of openness that real production-grade models fall into, and explains what each tier means for what a regulated institution can audit, modify, redistribute, reproduce, and depend on.

Why does the open-weight vs open-source distinction matter for regulated AI?

For most software contexts the open-weight versus open-source distinction is a technicality. For regulated AI it is a documentation question — and one that ends up in DORA third-party assessments, EU AI Act conformity packages, and procurement risk reviews.

The dependency profile in particular is more nuanced than "weights versus API." Holding weights gives the institution permanent access to a specific version of the model, but running it without the vendor still requires on-premise infrastructure, serving capability, and the operational expertise to maintain both. And even with weights and infrastructure, explainability remains bounded by what the upstream provider has published about training data, methodology, and architecture — the institution can run an open-weight model autonomously without being able to fully audit how it was built. The openness tier is the input to an assessment that covers both operational dependency and audit-surface dependency, and the two do not move together.

What "open-weight" and "open source" actually mean

Open-weight

A model is open-weight when its trained weights are downloadable and usable. The institution can run inference on its own hardware, fine-tune on its own data, and — depending on the licence — redistribute the result. What is typically not released alongside the weights: the training data itself, the training code, the precise architecture details, the data-curation choices, and the evaluation harness.

The practical consequence is that an open-weight model can be operated and adapted but cannot be reproduced from scratch. If the upstream provider stops publishing new weights, the institution keeps the version it has but cannot easily generate a comparable successor on its own.

Open source (for AI)

The software-style definition of open source — source code freely available under a permissive licence — does not map cleanly onto AI models. A model's behaviour is shaped at least as much by training data and training recipe as by its source code. The Open Source Initiative's 2024 Open Source AI Definition (OSAID) attempts to bring the AI case into line with the software case: to be open-source AI, a model needs weights, training code, sufficient training-data documentation, and a licence permitting use, study, modification, and redistribution.

The bar is high, but the more interesting fact is that the constraint is not technical. There are plenty of genuinely open-source AI models — they tend to be the smaller and research-grade ones rather than the frontier names. Two reasons drive the pattern. The pragmatic one is that even with the full recipe and the data, training a state-of-the-art model requires compute resources almost no team outside the largest labs can muster — so "open" does not translate into "anyone can rebuild it." The financial one is more decisive: training a frontier model costs tens to hundreds of millions, and the established mechanisms for capitalising on that investment all involve restricting access. A lab that publishes the full training stack is choosing not to monetise that asset, and there is no equivalent of an enterprise-licence revenue stream that makes the choice pay back. The named models that financial institutions actually deploy in 2026 are, with rare exceptions, open-weight, not open-source — not because open-source AI is technically impossible at scale, but because the economics push toward weight-only release at the frontier.

The five tiers of "open" in AI models

In practice, real models fall into one of five tiers. Each tier has a different licensing posture, a different audit surface, and a different set of things a regulated institution can legitimately do with the model.

Tier 1 — Truly open source

Weights, training code, sufficient training-data documentation, and a licence permitting commercial use, modification, and redistribution. Full reproducibility in principle: a sufficiently resourced team could rebuild the model from the published artefacts.

Examples include OLMo (Allen Institute for AI), Pythia, and BLOOM. These are mostly research-grade releases rather than the workhorse models most banks deploy in production, but they exist and they set the upper bound of what "open" can mean in AI.

For regulated institutions, Tier 1 offers the strongest audit position and the lowest vendor-dependency profile. The trade-off is capability: production-grade Tier 1 models tend to lag the frontier on raw performance, and the institution accepts that gap in exchange for transparency.

Tier 2 — Open-weight under permissive licences

Weights released under Apache 2.0, MIT, or equivalent. Commercial use, modification, fine-tuning, and redistribution are all permitted without attribution thresholds, user limits, or acceptable-use clauses that materially restrict deployment. Training stack and data documentation are typically incomplete or closed.

Examples include Mistral 7B, Mistral Small, Mixtral, Qwen 2.5, Phi-4, and several DeepSeek releases. This is the tier most production deployments in European financial services actually use, because it combines competitive capability with licensing terms a procurement team can sign off in one pass.

The audit position is solid for what the model does and weaker for how it was built. Operational vendor-dependency for the deployed version is low — the weights are permanently in the institution's hands — though running them still requires on-premise infrastructure and operational capability, and the audit narrative around training stays bounded by what the provider has chosen to publish. The implication for governance: this tier is straightforwardly documentable for behaviour and licensing; it is not, however, "open source," and procurement and risk documentation should not describe it as such.

Tier 3 — Open-weight under conditional licences

Weights downloadable and usable, but with custom licences that impose specific conditions. The conditions vary: usage thresholds (e.g., Meta's Llama licence permits commercial use but imposes obligations on platforms exceeding 700 million monthly active users), acceptable-use clauses that prohibit specified categories of deployment, attribution or naming requirements, or jurisdictional restrictions.

Examples include Llama 3 / 4 (Meta), Gemma (Google), and some Stability AI releases. These models are widely used in production but require the procurement team to read the licence properly rather than assume Apache-like terms.

For a regulated institution, Tier 3 is fully usable for the vast majority of banking and insurance deployments — none of the named institutions hit Llama's MAU threshold, for example — but the licence becomes part of the third-party documentation under DORA, and any change of use case (consumer-facing chat, white-label resale to clients) needs to be re-checked against the original terms.

Tier 4 — Source-available or weights-available with use restrictions

Weights may be inspected or downloaded for research, evaluation, or non-commercial use, but production commercial deployment requires a separate licence or is restricted entirely. Some research releases sit here, as do certain Cohere and AI21 model variants under research-only terms.

For commercial regulated deployment, Tier 4 is generally not viable without a negotiated commercial licence, which converts the model into a vendor relationship with full DORA implications anyway. The "openness" in the name is misleading if the institution is doing anything other than internal R&D evaluation.

Tier 5 — Proprietary / closed weights

API access only. The provider controls the model, the updates, the rate limits, and the audit surface. The institution depends on the provider for availability and for any future versions. Examples include the GPT-4 / GPT-5 family (OpenAI), Claude (Anthropic), and Gemini (Google).

Tier 5 is a vendor relationship in the conventional sense. The decision criteria are not about "openness" at all — they are about whether the capability gap is worth the dependency, and whether the hidden-cost economics work at the institution's scale. The audit surface is whatever the provider chooses to disclose, and the conformity documentation under the EU AI Act has to lean on the provider's published materials rather than the institution's own inspection.

What can a regulated institution actually do with each tier?

The tier determines what is operationally and contractually available. Five dimensions matter for European financial services.

Audit. Tier 1 is the only tier that genuinely allows full audit — the institution can inspect weights, code, and training data documentation. Tier 2 and Tier 3 allow weight-level inspection and behavioural testing but not provenance verification. Tier 4 and Tier 5 reduce audit to whatever the provider publishes and the institution's own black-box testing.

Fine-tuning. Tier 1, 2, and 3 all permit fine-tuning on the institution's own data. Tier 4 may permit it under restricted terms. Tier 5 typically does not permit it at all, or permits it through a vendor-controlled service that keeps the fine-tuned weights on the vendor's infrastructure.

Redistribution. Tier 1 and Tier 2 permit redistribution freely (subject to the permissive licence's terms). Tier 3 permits redistribution under conditions. Tier 4 generally does not, and Tier 5 does not.

Reproducibility. Only Tier 1 supports reproducing the model from scratch. Tier 2 and Tier 3 do not, because the training stack and data are not fully published. This matters for regulatory situations where a supervisor asks how a specific output came about — the institution can describe the model's behaviour but not the upstream construction.

Vendor dependency. Tier 1, 2, and 3 give the institution permanent access to the weights it has downloaded — the deployed version keeps working without the provider, provided the institution maintains the GPU infrastructure and operational capability to run it. Future versions depend on the provider continuing to release. And the audit-surface dependency — what the institution can say about how the model was trained — remains bounded by what the provider published, regardless of how long the institution keeps running its local copy. Tier 4 has limited operational dependency for non-commercial use and full dependency for commercial. Tier 5 is a total dependency: if the provider changes terms or withdraws the service, the institution's deployment is affected immediately. The base-model selection process treats vendor dependency as one of the decision dimensions for exactly this reason.

What does this mean for governance and procurement?

The practical translation is that "open" is not a sufficient term in a regulated institution's documentation. The model's tier should be named precisely in the AI system inventory, the procurement record, and the EU AI Act technical documentation.

For DORA third-party assessment, the tier directly affects the dependency profile. A Tier 5 model creates a full ICT third-party dependency with all the contractual governance, audit-rights, and exit-strategy obligations the regulation expects. A Tier 2 model the institution runs on its own infrastructure shifts the dependency rather than eliminating it: the operational dependency on the upstream provider for the deployed version is removed, but the audit-surface dependency — what the institution can describe about how the model was trained — remains. The institution also depends on its own infrastructure capability to run the model, and on the upstream community for future versions. The regulation treats these softer forms of dependency differently from a live API relationship, but they exist and need to be named.

For EU AI Act conformity, the technical documentation for high-risk systems needs to describe the training data, intended purpose, and architecture. The completeness of this description is bounded by the tier: Tier 1 supports a full description; Tier 2 and Tier 3 require referencing the upstream provider's published documentation, which may be incomplete; Tier 5 means the institution is fundamentally relying on the provider's representations.

The honest tradeoff is that no tier is universally best. Tier 1 gives transparency and weakens vendor lock-in at the cost of capability. Tier 5 gives capability and stability at the cost of opacity and dependency. The middle tiers — open-weight permissive and open-weight conditional — are where most European deployments actually live, because they trade some transparency for capability that the bank can run on its own infrastructure. What matters is that the institution names the tier correctly and accepts the trade-offs deliberately, rather than describing the deployment as "open source" when it is not.

Key takeaways

The word "open" covers five materially different tiers of AI model openness. Conflating them in procurement and regulatory documentation creates exposure that is straightforward to avoid by naming the tier precisely. The model families European banks actually deploy — Llama, Mistral, Qwen, Gemma, Phi — are open-weight, not open-source. The distinction is not pedantic; it is the basis of the audit posture and the third-party-dependency assessment.

For most regulated production deployments, Tier 2 open-weight models under permissive licences are the operational sweet spot — capability close to the frontier, full institutional control of the weights, predictable cost, and no MAU-style triggers in the licence. Tier 3 models under conditional licences are equally viable but require the procurement team to actually read the terms. Tier 5 proprietary models are vendor relationships, not deployments, and should be evaluated as such.

Documentation precision is the cheapest piece of governance available. Naming the tier correctly in the system inventory now is a one-time cost that pays off every time a supervisor, internal auditor, or risk committee asks how the institution is using AI.

If your institution is sorting through its model registry and wants to scope the openness tier for each deployment cleanly before the regulator asks, we can help you do that.

Related reading:

Open-weight vs open-source AI: what the difference actually means for regulated deployment