The One-Third Rule: When Fine-Tuning an LLM Makes You a Regulator's Problem

This is exactly the kind of compliance question that separates teams who ship from teams who get stuck in legal review for six months. If the intersection of compute thresholds, regulatory classification, and practical tooling sounds like something worth discussing in person, Human x AI Europe in Vienna on May 19 is where Europe's AI practitioners are working through these problems together.

The Compliance Trap Nobody Saw Coming

Most teams fine-tuning LLMs for domain-specific applications assume they're downstream users. They grab a foundation model, run some supervised fine-tuning on proprietary data, deploy it, and move on. That assumption worked fine until August 2, 2025.

AWS's new guidance on EU AI Act compliance makes the stakes explicit: fine-tuning an LLM can reclassify an organization from a downstream user to a GPAI (General-Purpose AI) model provider. The difference matters. A downstream user integrates existing models. A GPAI provider is legally responsible for the model's compliance, including technical documentation, training data summaries, copyright policies, and information disclosure to regulators.

The classification hinges on compute. Specifically, floating-point operations (FLOPs) consumed during fine-tuning.

The One-Third Rule, Explained

The European Commission's guidelines establish what practitioners now call the "one-third rule." If fine-tuning uses more than 30% of the original model's pretraining compute, the modification is considered substantial enough to create a "new" model. The downstream modifier becomes a provider.

The regulatory logic: using more than one-third of the original training compute typically results in significant behavioral changes. The model's risk profile shifts. The Commission determined that such modifications warrant full provider obligations.

Three scenarios apply:

Pretraining compute is known and ≥ 10²³ FLOPs: 30% of actual pretraining compute
Pretraining compute is unknown or < 10²³ FLOPs: Default threshold of 3.3×10²² FLOPs
Original model has systemic risk: Modified model presumed to have systemic risk

Most organizations land in scenario two. Model providers rarely publish exact training FLOPs. Unless documented pretraining compute is available from the model provider, the default threshold of 3.3×10²² FLOPs applies.

What AWS Built

Amazon SageMaker AI's Fine-Tuning FLOPs Meter is an open-source toolkit that integrates compliance tracking into existing training pipelines. The implementation approach:

Single configuration flag. Enable FLOPs tracking without restructuring training code. The toolkit hooks into SageMaker Training jobs and monitors compute consumption throughout the fine-tuning process.

Audit-ready documentation. The toolkit generates compliance reports that map directly to EU AI Act requirements. When regulators ask "how much compute did this fine-tuning job consume?", the answer exists in a format they expect.

Threshold alerts. Teams can set warning thresholds below the regulatory limit. Getting notified at 20% of the threshold is more useful than discovering the problem at 35%.

The toolkit leverages SageMaker's existing infrastructure: AWS CloudTrail for governance logging, Amazon CloudWatch for monitoring, and automatic resource decommissioning after training completes.

The Practical Problem with FLOPs Tracking

Counting FLOPs sounds straightforward. It isn't.

Distributed training across multiple GPUs introduces measurement complexity. Gradient accumulation affects how compute maps to actual training steps. Mixed-precision training (FP16, BF16, FP32) requires careful accounting. Parameter-efficient fine-tuning methods like LoRA and QLoRA modify only a subset of model weights, but the compute still counts.

SageMaker's integration with Hugging Face Transformers handles much of this complexity. The Fine-Tuning FLOPs Meter builds on that foundation, tracking compute across the specific fine-tuning techniques enterprises actually use: supervised fine-tuning, LoRA, QLoRA, and RLHF (Reinforcement Learning from Human Feedback).

The harder problem is cumulative tracking. The Commission's guidelines don't explicitly address whether incremental modifications that cumulatively exceed the threshold trigger provider status. A team running multiple fine-tuning iterations over months needs to track total compute, not just individual job compute.

What Provider Status Actually Requires

Crossing the threshold triggers Article 53 obligations under the EU AI Act:

Technical documentation. Detailed records of training processes, evaluation results, and performance metrics. The documentation must be maintained and updated throughout the model's lifecycle.

Training data summary. A public summary using the AI Office's mandatory template. This isn't optional, and the template format is prescribed.

Copyright compliance policy. A documented approach to respecting text-and-data-mining opt-outs under EU copyright law.

Downstream provider information. Sufficient documentation for anyone integrating the model into their own AI systems.

If the original model was classified as having systemic risk (trained with ≥10²⁵ FLOPs), the modified model inherits that classification. Additional obligations apply: model evaluations, adversarial testing, incident tracking and reporting, and cybersecurity measures.

Enforcement powers activate in August 2026, with penalties up to EUR 35 million or 7% of global turnover for violations.

The Rollback Question

Before launching any fine-tuning job that might approach the threshold, answer three questions:

What's the compute budget? Set a hard limit below the regulatory threshold. If the fine-tuning job needs more compute than the limit allows, that's a product decision, not a training decision.

Who owns compliance if the threshold is crossed? This isn't a technical question. Legal, compliance, and product leadership need to agree on the answer before training starts.

What's the rollback plan? If a fine-tuning job unexpectedly crosses the threshold, can the organization revert to a previous checkpoint? Is there a version of the model that stays below the limit while still meeting business requirements?

What This Means for European AI Teams

The EU AI Act's GPAI provisions create a new category of compliance work that didn't exist eighteen months ago. Teams fine-tuning LLMs for domain-specific applications now need:

Compute tracking infrastructure. Not optional. The AWS toolkit is one implementation; others will follow.

Legal clarity on provider status. Before fine-tuning starts, not after.

Documentation workflows. Technical documentation requirements are specific. Building these workflows into the development process is cheaper than retrofitting them after deployment.

Threshold monitoring. Continuous, not periodic. Cumulative compute tracking across multiple fine-tuning iterations.

The teams that treat this as an infrastructure problem, building compliance tracking into their MLOps pipelines from the start, will ship faster than teams that treat it as a legal problem to solve later.

The Commission's guidelines acknowledge that enforcement will focus initially on providers who adhere to the voluntary Code of Practice. But "initially" has a shelf life. The infrastructure for compliance tracking needs to exist before enforcement priorities shift.

Frequently Asked Questions

Q: What is the EU AI Act's one-third rule for LLM fine-tuning?

A: The one-third rule states that if fine-tuning compute exceeds 30% of the original model's pretraining compute, the downstream modifier becomes a GPAI model provider under EU law. When pretraining compute is unknown, the default threshold is 3.3×10²² FLOPs.

Q: When did EU AI Act GPAI provider obligations take effect?

A: GPAI provider obligations under the EU AI Act became applicable on August 2, 2025. Full enforcement powers and penalties activate on August 2, 2026, with fines up to EUR 35 million or 7% of global turnover.

Q: How does Amazon SageMaker AI track FLOPs for EU AI Act compliance?

A: AWS released the Fine-Tuning FLOPs Meter, an open-source toolkit that integrates with SageMaker Training jobs. It monitors compute consumption during fine-tuning, generates audit-ready documentation, and provides threshold alerts before regulatory limits are crossed.

Q: What happens if my fine-tuned model crosses the GPAI provider threshold?

A: The organization becomes legally responsible for Article 53 obligations: maintaining technical documentation, publishing training data summaries, implementing copyright compliance policies, and providing information to downstream providers and regulators.

Q: Does the one-third rule apply to parameter-efficient fine-tuning methods like LoRA?

A: Yes. Compute consumed during LoRA, QLoRA, and other parameter-efficient fine-tuning methods counts toward the threshold. The fact that only a subset of model weights are modified does not exempt the compute from regulatory tracking.

Q: What if the original model I'm fine-tuning has systemic risk classification?

A: If the original GPAI model was classified as having systemic risk (trained with ≥10²⁵ FLOPs), any substantially modified version is presumed to inherit that classification. Additional obligations apply, including model evaluations, adversarial testing, and incident reporting.