← Back to patterns
build

Why Hugging Face Chose Pre-Padding Over Dynamic LoRA Loading

TRIGGER

Compiled diffusion models with LoRA adapters required recompilation whenever adapters were swapped, negating the speed benefits of torch.compile. Each LoRA has different ranks and target layers, causing architecture changes that trigger recompilation on every swap.

APPROACH

Hugging Face's Diffusers/PEFT team implemented hotswap-ready LoRA loading. Before compilation, they call enable_lora_hotswap(target_rank=max_rank) to pre-pad all LoRA weight tensors to the maximum rank needed across all adapters. Scaling factors are converted from floats to tensors. Input: first LoRA loaded normally, then compile, then subsequent LoRAs loaded with hotswap=True. Output: adapter swaps without recompilation. Combined with FA3, FP8 quantization, and torch.compile on H100, they achieved 2.23x speedup over baseline (3.55s vs 7.89s). Without hotswap, compilation still helped but intermittent recompilation stalls reduced speedup to only 1.55x.

PATTERN

2.23x speedup drops to 1.55x when LoRA swaps trigger recompilation. Pre-padding all adapters to max rank makes dynamic components static-shaped. Computing on padded zeros costs far less than intermittent recompilation stalls.

WORKS WHEN

  • Maximum LoRA rank across all adapters is known ahead of time
  • Subsequent LoRAs target the same layers or a subset of layers that the first LoRA targets
  • Adapter swapping happens frequently enough that recompilation costs dominate inference time
  • Using JIT compilation (torch.compile) where recompilation is expensive
  • LoRAs only modify the main model (e.g., transformer), not auxiliary encoders

FAILS WHEN

  • LoRAs target completely disjoint layers with no overlap
  • Maximum rank varies wildly (padding a rank-4 to rank-256 wastes significant compute)
  • Single-use inference where compilation overhead isn't amortized
  • Text encoder LoRAs are needed (not yet supported in this implementation)
  • Using eager execution without compilation where recompilation isn't an issue

Stage

build

From

July 2025

Want patterns like this in your inbox?

3 patterns weekly. No fluff.