Home Knowledge Base Measurement Specifications

Measurement Specifications

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

arXiv:2605.30803v1 Announce Type: new Abstract: LLM judges are increasingly used to evaluate open-ended responses, but their scores depend strongly on the rubrics that condition them. A vague rubric asking for a response to be ``helpful and factual'' can reward polished answers that invent facts or violate user intent. We treat reusable rubrics as measurement specifications: changing the rubric changes the response quality measurement induced by a fixed judge.

arXiv CS 9d ago

Generating Rectifiable Measures through Neural Networks

Announce Type: replace Abstract: We derive universal approximation results for the class of (countably) $m$-rectifiable measures. Specifically, we prove that $m$-rectifiable measures can be approximated as push-forwards of the one-dimensional Lebesgue measure on $[0,1]$ using ReLU neural networks with arbitrarily small approximation error in terms of Wasserstein distance. What is more, the weights in the networks under consideration are quantized and bounded and the number of ReLU neural...

arXiv CS 7d ago

Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

new Abstract: Quantities with measurement units, such as 110 cm and 1.2 m, require language models (LMs) to combine a numeral with a symbolic unit scale. Here, we study how LMs compare such quantities in controlled settings spanning several unit systems. We find that accuracy degrades near the comparison boundary, where small changes in value determine the correct answer.

arXiv CS 7d ago

A red-emitting, genetically encoded indicator for two-photon voltage recording in vivo

Genetically encoded voltage indicators (GEVIs) enable minimally invasive, cell-type-specific optical measurements of neuronal membrane potential with millisecond temporal resolution. Red-shifted GEVIs are especially advantageous because they permit spectral multiplexing with complementary sensors and enable all-optical circuit interrogation in combination with blue-light-activated opsins. Despite these advantages, existing red GEVIs remain poorly suited for in vivo use due to limited...

bioRxiv 7d ago

Ultrasensitive voltage imaging reveals distinct electrical microdomains in neurons

For the brain to compute, electrical signals must propagate over the membranes of individual neurons, connecting synaptic inputs to synaptic outputs. Complex neuronal morphologies coupled with the spatial organization of synaptic inputs and outputs enable diverse voltage transformations that underlie cell-type specific computations. However, measuring these transformations in vivo has remained challenging, leaving a crucial gap in our mechanistic understanding of single neuron computation.

bioRxiv 10d ago

Confidence Before Answering: A Paradigm Shift for Efficient LLM Uncertainty Estimation

arXiv:2603.05881v2 Announce Type: replace Abstract: Reliable deployment of large language models (LLMs) requires accurate uncertainty estimation. Existing methods are predominantly answer-first, producing confidence only after generating an answer, which measure the correctness of a specific response and limits practical usability. We study a confidence-first paradigm, where the model outputs its confidence before answering, interpreting this score as the model's probability of answering the...

arXiv CS 6d ago

QASM-Eval: A Dataset to Train and Evaluate LLMs on OpenQASM-3 Beyond Quantum Circuits

Announce Type: new Abstract: Quantum computing remains in the Noisy Intermediate-Scale Quantum (NISQ) era, where the performance is highly constrained to noise. Addressing the limitation often requires hardware-facing capabilities beyond gate-sequence circuit specification, including mid-circuit measurement and classical feedback for quantum error correction (QEC), precise timing control for dynamical decoupling (DD), and pulse-level waveform access for calibration. OpenQASM-3 was introduced...

arXiv CS 9d ago

Conformal Disentanglement and Latent-Space Curation: A Neural Framework for Perspective Synthesis, Differentiation and Targeted Generation

Announce Type: replace Abstract: Many scientific and engineering problems involve observing a common phenomenon through multiple heterogeneous sensors or measurement modalities. Such observations typically contain both information shared across sensors, reflecting the underlying system, and sensor-specific or extraneous components arising from measurement processes or environmental effects. Disentangling these contributions is essential when sensor-independent observations are unavailable.

arXiv CS 2d ago

Clinically Grounded Privacy Evaluation of Medical LMs

Announce Type: new Abstract: Medical language models (LMs) can memorize and reproduce protected health information, but privacy evaluations often focus on recovery of training text rather than disclosure under realistic threat models. We introduce a clinically grounded framework that evaluates leakage along a graded axis of adversarial access, ranging from publicly inferable demographics to leaked note fragments. At each tier, we measure verbatim memorization of patient-specific text and...

arXiv CS 1d ago

Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation

Announce Type: new Abstract: TLA+ has supported industrial verification at companies such as Amazon and Microsoft, yet writing correct TLA+ specifications from natural language still requires time and expertise, which limits adoption. LLMs show promise, but no prior study measures whether they produce semantically correct TLA+ specifications from natural language. This paper presents the first systematic evaluation of LLM-based TLA+ specification synthesis from natural language.

arXiv CS 5d ago