the Linear Representation Hypothesis
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
The Cylindrical Representation Hypothesis for Language Model Steering
Announce Type: replace Abstract: Steering is a widely used technique for controlling large language models, yet its effects are often unstable and hard to predict. Existing theoretical accounts are largely based on the Linear Representation Hypothesis (LRH). While LRH assumes that concepts can be orthogonalized for lossless control, this idealized mapping fails in real representations and cannot account for the observed unpredictability of steering.
When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception
arXiv:2605.30381v1 Announce Type: new Abstract: Deceptive alignment, in which models maintain accurate internal representations while deliberately producing false outputs, remains a central challenge in AI safety. While strategic deception is the primary long-term concern, synthetic dishonesty - induced via direct optimization on incorrect answers - provides a controlled testbed for studying the representational basis of learned deception.
Hallucinations as Orthogonal Noise: Inference-Time Manifold Alignment via Dynamic Contextual Orthogonalization
Announce Type: new Abstract: Hallucination in Large Language Models (LLMs), characterized by the generation of content inconsistent with contextual facts or logical constraints -- remains a persistent challenge for reliable deployment. In this work, we address this issue through a geometric framework rooted in the linear representation hypothesis. We propose that hallucinations manifest as orthogonal noise relative to the semantic manifold of the residual stream.
The Information Geometry of Softmax: Probing and Steering
arXiv:2602.15293v2 Announce Type: replace Abstract: This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation is that the natural geometry of these representation spaces should reflect the way models use representations to produce behavior. We focus on the important special case of representations that define softmax distributions.
Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation
new Abstract: Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the...
A prognostic human brain network for diffuse midline glioma
Abstract Diffuse midline gliomas (DMGs) are near-universally lethal tumours of the childhood central nervous system1,2. In animal models, DMGs form brain-wide integrated networks through neuron-to-glioma synapses3,4,5,6 and glioma-to-glioma gap junctional coupling3. This extensive connectivity robustly promotes the growth and invasion of DMG3,4,5,6,7,8,9 and other glial malignancies10,11,12 through paracrine mechanisms and direct neuron-to-glioma synapses.
The role of falx cerebri in the selective vulnerability of splenium within the corpus callosum
The corpus callosum is the largest white matter structure connecting the two cerebral hemispheres and is anatomically divided into three major subregions along the anteroposterior axis: the genu, midbody, and splenium. The splenium is frequently affected in traumatic head impacts, yet the biomechanical basis for this selective vulnerability remains poorly understood. Clinical studies have long hypothesized that the falx cerebri contributes to the splenial susceptibility because of its close...
Gene ancestries reveal diverse microbial associations during eukaryogenesis
Abstract The origin of eukaryotes remains a central enigma in biology1. Continuing debates agree on the pivotal role of a symbiosis between an alphaproteobacterium and an Asgard archaeon2,3. However, the nature, timing and contributions of other potential bacterial partners4,5,6 and the role of interactions with viruses7,8,9 remain contentious.
Whole-genome duplication shaped cell-type evolution in the vertebrate brain
Abstract The complex brains of vertebrates have more cell types than those of their closest relatives. Whole-genome duplications (WGDs) occurred during early vertebrate evolution1, but it is unclear whether the duplicated genes (ohnologues) facilitated cell-type evolution. Here using brain single-cell transcriptomes from five chordates—human2, mouse3, lizard4, lamprey5 and amphioxus—we report that many cell-type families with conserved core transcription factors in vertebrates do not show...