Home › Knowledge Base › SAH

SAH

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Operationalising the Superficial Alignment Hypothesis via Task Complexity

Announce Type: replace Abstract: The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments supporting it, and (ii) important critiques to it. We propose a new metric called task complexity: the length of the shortest program that achieves a target performance...

arXiv CS 1d ago

Exact Unlearning in Reinforcement Learning

arXiv:2606.04182v1 Announce Type: new Abstract: We formulate the problem of \emph{exact unlearning} in reinforcement learning, where the goal is to design an efficient framework that enables the removal of any user's data upon deletion request, i.e., the online learner's output after unlearning is \emph{indistinguishable} from what would have been produced had the deleted user never interacted with the learner. For any $\rho >0$, we show that there exists a reinforcement learning (RL)...

arXiv CS 6d ago

SAHG: Sector-Anisotropic Hyperbolic Graph Model for Social Bot Detection

arXiv:2605.30166v2 Announce Type: replace Abstract: LLM-driven social bots can generate fluent, human-like text, reducing the discriminative advantage of content-based detection alone. However, coordinated campaigns still leave relational patterns -- interactions, behavioral similarity, shared neighborhoods, community positions, and coordinated activity -- that graph-based methods can exploit. Existing graph detectors face two challenges when exploiting such evidence.

arXiv CS 7d ago