Measurement Specifications
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges
arXiv:2605.30803v1 Announce Type: new Abstract: LLM judges are increasingly used to evaluate open-ended responses, but their scores depend strongly on the rubrics that condition them. A vague rubric asking for a response to be ``helpful and factual'' can reward polished answers that invent facts or violate user intent. We treat reusable rubrics as measurement specifications: changing the rubric changes the response quality measurement induced by a fixed judge.
Generating Rectifiable Measures through Neural Networks
Announce Type: replace Abstract: We derive universal approximation results for the class of (countably) $m$-rectifiable measures. Specifically, we prove that $m$-rectifiable measures can be approximated as push-forwards of the one-dimensional Lebesgue measure on $[0,1]$ using ReLU neural networks with arbitrarily small approximation error in terms of Wasserstein distance. What is more, the weights in the networks under consideration are quantized and bounded and the number of ReLU neural...
Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics
new Abstract: Quantities with measurement units, such as 110 cm and 1.2 m, require language models (LMs) to combine a numeral with a symbolic unit scale. Here, we study how LMs compare such quantities in controlled settings spanning several unit systems. We find that accuracy degrades near the comparison boundary, where small changes in value determine the correct answer.
A red-emitting, genetically encoded indicator for two-photon voltage recording in vivo
Genetically encoded voltage indicators (GEVIs) enable minimally invasive, cell-type-specific optical measurements of neuronal membrane potential with millisecond temporal resolution. Red-shifted GEVIs are especially advantageous because they permit spectral multiplexing with complementary sensors and enable all-optical circuit interrogation in combination with blue-light-activated opsins. Despite these advantages, existing red GEVIs remain poorly suited for in vivo use due to limited...
Ultrasensitive voltage imaging reveals distinct electrical microdomains in neurons
For the brain to compute, electrical signals must propagate over the membranes of individual neurons, connecting synaptic inputs to synaptic outputs. Complex neuronal morphologies coupled with the spatial organization of synaptic inputs and outputs enable diverse voltage transformations that underlie cell-type specific computations. However, measuring these transformations in vivo has remained challenging, leaving a crucial gap in our mechanistic understanding of single neuron computation.
Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation
arXiv:2603.05881v2 Announce Type: replace Abstract: Reliable deployment of large language models (LLMs) requires accurate uncertainty estimation. Existing methods are predominantly answer-first, producing confidence only after generating an answer, which measure the correctness of a specific response and limits practical usability. We study a confidence-first paradigm, where the model outputs its confidence before answering, interpreting this score as the model's probability of answering the...
QASM-Eval: A Dataset to Train and Evaluate LLMs on OpenQASM-3 Beyond Quantum Circuits
Announce Type: new Abstract: Quantum computing remains in the Noisy Intermediate-Scale Quantum (NISQ) era, where the performance is highly constrained to noise. Addressing the limitation often requires hardware-facing capabilities beyond gate-sequence circuit specification, including mid-circuit measurement and classical feedback for quantum error correction (QEC), precise timing control for dynamical decoupling (DD), and pulse-level waveform access for calibration. OpenQASM-3 was introduced...
Conformal Disentanglement and Latent-Space Curation: A Neural Framework for Perspective Synthesis, Differentiation and Targeted Generation
Announce Type: replace Abstract: Many scientific and engineering problems involve observing a common phenomenon through multiple heterogeneous sensors or measurement modalities. Such observations typically contain both information shared across sensors, reflecting the underlying system, and sensor-specific or extraneous components arising from measurement processes or environmental effects. Disentangling these contributions is essential when sensor-independent observations are unavailable.
Clinically Grounded Privacy Evaluation of Medical LMs
Announce Type: new Abstract: Medical language models (LMs) can memorize and reproduce protected health information, but privacy evaluations often focus on recovery of training text rather than disclosure under realistic threat models. We introduce a clinically grounded framework that evaluates leakage along a graded axis of adversarial access, ranging from publicly inferable demographics to leaked note fragments. At each tier, we measure verbatim memorization of patient-specific text and...
Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation
Announce Type: new Abstract: TLA+ has supported industrial verification at companies such as Amazon and Microsoft, yet writing correct TLA+ specifications from natural language still requires time and expertise, which limits adoption. LLMs show promise, but no prior study measures whether they produce semantically correct TLA+ specifications from natural language. This paper presents the first systematic evaluation of LLM-based TLA+ specification synthesis from natural language.