Home Knowledge Base an \textbf{i}dentifiable SAE

an \textbf{i}dentifiable SAE

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Toward Identifiable Sparse Autoencoders

Announce Type: new Abstract: Recently, sparse autoencoders (SAEs) have emerged as an attractive tool for interpreting and interacting with representations in practical neural networks. While it is common empirical folklore, we also show theoretically that SAEs are highly unstable: different training runs are likely to produce different concept dictionaries and sparse codes. We characterize the model properties that hinder the stability of real-world SAEs, and address each of these problems...

arXiv CS 9d ago