Concept-SAE
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Concept-SAE: A Controllable and Invertible Concept Interface for Sparse Autoencoders
arXiv:2509.22015v2 Announce Type: replace Abstract: Standard Sparse Autoencoders (SAEs) excel at discovering a dictionary of a model's learned features, providing a powerful lens for passive feature discovery. However, this passive nature makes it difficult to systematically evaluate or analyze concepts that users explicitly care about. We introduce Concept-SAE, a framework that augments SAEs with a structured and controllable interface for probing user-defined concepts.