Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Vincent-Daniel Yun, Junhyuk Jo, Sai Praneeth Karimireddy, Sunwoo Lee 1 min read

Key Points

Announce Type: replace Abstract: Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a mismatch between the hidden state received by the next surviving layer and the distribution it was trained to process, leading to significant performance degradation. We propose Ghosted Layers, a training-free recovery module that addresses this issue by solving a boundary activation alignment problem. Our method derives a closed-form optimal linear operator...

arXiv:2605.15491v2 Announce Type: replace Abstract: Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a mismatch between the hidden state received by the next surviving layer and the distribution it was trained to process, leading to significant performance degradation. We propose Ghosted Layers, a training-free recovery module that addresses this issue by solving a boundary activation alignment problem. Our method derives a closed-form optimal linear operator from a small calibration set to reconstruct the activation discrepancy introduced by the pruned layers. We show that this solution corresponds to the unconstrained optimum of the alignment objective, whereas existing methods are restricted to constrained solutions over limited operator subspaces. Experiments across multiple LLM backbones and pruning strategies demonstrate that our method consistently improves accuracy and perplexity over prior training-free baselines, while preserving the efficiency gains of layer pruning. Official code repository: https://github.com/daniel-eai/ghosted_layers_official_repository/.

Transformer (ORG) Ghosted Layers (ORG) LLM (ORG)

Originally published by arXiv CS Read original →

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

Related Stories

Smacking leads to worse exam results, research finds

Systematic comparison of VMEC and HINT equilibrium calculations for finite-beta LHD plasmas

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

Support sufficiency as action-sufficient compression: a single-cycle rate-regret formulation