Home Knowledge Base AxBench

AxBench

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

arXiv:2605.31183v1 Announce Type: new Abstract: Sparse Autoencoders (SAEs) have been seen as a promising avenue for exploring the internals of Large Language Models (LLMs) and for steering model output generation. When AxBench - a model steering benchmark - was introduced in Wu et al. (2025), SAEs did not seem to live up to their original hype due to poor steering performance relative to a set of simple baselines.

arXiv CS 9d ago