the Activation Oracle
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Building Better Activation Oracles
Announce Type: new Abstract: Activation Oracles (AOs) are promising methods for interpreting residual stream activations. However, current AOs face important issues, such as hallucinations and vagueness. Additionally, text-inversion confounds make them hard to evaluate.
Building Better Activation Oracles
arXiv:2606.02609v2 Announce Type: replace Abstract: Activation Oracles (AOs) are promising methods for interpreting residual stream activations. However, current AOs face important issues, such as hallucinations and vagueness. Additionally, text-inversion confounds make them hard to evaluate.
AI Job Grief: The Unnamed Psychological Crisis Hitting Tech Workers
AI Job Grief: The Unnamed Psychological Crisis Hitting Tech Workers In the summer of 2025, an Epic Games layoff cut a worker who was a terminally ill father. According to the most-discussed account of the episode, his family lost his life insurance along with the job.
The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust
new Abstract: As language models improve and become increasingly deployed to solve a variety of tasks, trustworthiness becomes essential. Calibration is a good proxy for trust: well-calibrated confidence estimates help inform the risk versus reward tradeoff when trusting a specific model output. Unfortunately, even as models improve, they remain poorly calibrated, often biasing towards overconfidence.
Thresholded Local Hyper-Flow Diffusion
arXiv:2606.09340v1 Announce Type: new Abstract: Local Hyper-Flow Diffusion (HFD) gives an edge-size-independent Cheeger-type guarantee for seeded clustering in general submodular hypergraphs, but existing HFD solvers do not keep intermediate computation local at every iteration. We introduce Thresholded Local HFD (TL-HFD), a first-order method that maintains an active region around the seeds, performs projected subgradient updates on that region and its immediate boundary, and expands via...
Graph Neural Networks for Fast Operator Selection in Adaptive VQE
arXiv:2606.08794v1 Announce Type: cross Abstract: Adaptive variational quantum algorithms like ADAPT-VQE construct tailored ans\"atze by iteratively selecting operators from a pool using gradient-based criteria. While this avoids oversized parameter spaces, repeatedly scanning the full pool incurs a classical cost that scales linearly with pool size-a major bottleneck for systems with long-range interactions or large operator sets. Here, we reformulate adaptive operator selection as a...
Remember with Confidence: Uncertainty Quantification for Spatio-temporal Memory with Probabilistic Guarantees
Announce Type: new Abstract: Long-horizon robot operation requires spatio-temporal memory to record the environment state and recall it for downstream reasoning. Scene graphs and retrieval-augmented systems ground VLM descriptions to persistent 3D entities with rich semantic descriptions. However, VLM captions are noisy and viewpoint-inconsistent, and existing systems treat them as an oracle with no mechanism to detect unreliable stored descriptions.
Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models
arXiv:2606.09749v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have demonstrated impressive end-to-end performance across a variety of robotic manipulation tasks. However, these policies offer no guarantees against collisions with task-irrelevant objects in the scene. Existing safety filters sidestep this problem by querying a vision-language model (VLM) to identify obstacles and their locations.
To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection
arXiv:2606.05931v1 Announce Type: new Abstract: When retrieving a person from a video archive by voice and face, should the system be multimodal or not? In real-world broadcast archives, unlike curated benchmarks, a target may be heard but unseen, seen but unheard, or both. Fusing scores from an absent modality injects noise, degrading precision below the best unimodal system.
Iterative Thresholding Pursuit with Continuation for $\ell_{1-2}$-Regularized Sparse Recovery
Announce Type: new Abstract: Sparse recovery aims to reconstruct sparse signals from underdetermined and possibly noisy linear measurements. Existing $\ell_{1-2}$ iterative thresholding schemes are first-order methods. We propose an iterative thresholding pursuit method with continuation (ITP-C) for $\ell_{1-2}$-regularized sparse recovery.