Home Knowledge Base Usability Rate

Usability Rate

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

arXiv:2606.02965v1 Announce Type: new Abstract: Benchmarks for autonomous agents measure whether agents complete tasks, yet this framing is systematically blind to whether an agent should have proceeded at all. Agents trained under human-feedback objectives develop a structural tendency to proceed even when they lack the inputs, evidence, or authorization to act safely, a disposition we term compliance bias, because both the reward signal and the benchmark scoring regime treat proceeding as...

arXiv CS 7d ago

Ouvia: A User-centered Framework for Measuring Usability of Speech Translation in Real-World Communication Scenarios

arXiv:2606.06177v1 Announce Type: new Abstract: Speech translation (ST) is increasingly adopted in user applications, yet its evaluation largely focuses on decontextualized testbeds and holistic quality, rather than end users' communication needs. We introduce Ouvia, an evaluation framework for measuring user-perceived usability of speech translation outputs in real-world settings. Ouvia focuses on one-to-one communication: an English speaker needs to convey a request to a Portuguese...

arXiv CS 5d ago

In-Situ Immersive Analytics Authoring through Ergonomic Keyboard Support

arXiv:2606.08927v1 Announce Type: new Abstract: Immersive analytics uses augmented reality (AR) to integrate data analysis and authoring within physical environments. However, extensive text entry required for immersive analytics authoring remains a fundamental challenge in AR, as popular natural user interfaces often hinder expressive input. This paper presents the Body-Supported Keyboard (BSK), an ergonomic system that allows the mobile use of a Bluetooth keyboard in AR.

arXiv CS 1d ago

Usability Analysis of Configurator User Interfaces with Multimodal Large Language Models

Announce Type: replace Abstract: Configuration is a key technology for tailoring complex software systems, services, and products. A successful application of configurators not only depends on technical correctness, performance, and domain modeling but also on their usability. While general usability heuristics are widely used, configurator-specific criteria and tool support for systematic user interface (UI) analysis are limited.

arXiv CS 6d ago

SecureClaw: Clawing Back Control of LLM Agents

arXiv:2606.09549v1 Announce Type: new Abstract: Tool-using large language model (LLM) agents face two distinct security failures: unauthorized external actions and exposure of sensitive plaintext inside the runtime before any final output check can intervene. Existing defenses usually protect one boundary, either the planner/runtime or the action sink, and therefore do not by themselves secure both surfaces. We present SecureClaw, a dual-boundary architecture that places authorization at the...

arXiv CS 1d ago

Cybernetic Android Avatar "Yui": System Integration, Field Deployment, and Evaluation

arXiv:2606.08099v1 Announce Type: new Abstract: Remote communication technologies have become widely used; however, supporting a sense of shared physical space and conveying rich non-verbal cues remain challenging in many social interaction scenarios. This study presents "Yui," a full-body cybernetic android avatar designed to integrate operator-side immersive teleoperation with interlocutor-side human-like social signaling. Yui combines a 55-degrees of freedom full-body mechanism with a...

arXiv CS 1d ago

ChessMimic: Per-Rating Transformer Models for Human Move, Clock, and Outcome Prediction in Online Blitz Chess

arXiv:2606.04473v1 Announce Type: new Abstract: We present ChessMimic, a system of three small encoder-only transformers - for move, thinking-time, and outcome prediction - conditioned on the position, recent move history, player rating, and clock state. We fit a separate instance of each model per 100-Elo rating band, trading parameter efficiency for sharper per-skill calibration. On a held-out month-wide slice of Lichess Rated Blitz games ChessMimic's human move prediction accuracy...

arXiv CS 6d ago

Jim Cramer calls elevated CPI 'artificial inflation' — what that means for the stock market

The stock market was under pressure again on Wednesday due to an intensification of the war with Iran. The subsequent rise in oil prices on the same day that the consumer price index registered its highest reading in three years certainly didn't help. The war is a wildcard.

CNBC 7h ago

The Smart Bird Feeders Everyone’s Talking About (and Actually Buying) (2026)

you’ve probably seen a smart bird feeder or know someone who has one. They’re easily recognizable with their clear housing, cameras, and solar panels. Perhaps a friend or family member has sent you a photo or video of a bright goldfinch or handsome woodpecker (guilty).

Wired 1d ago

Test-Time Compute for Frozen Embedding Models through Agentic Program Search

arXiv:2605.11374v5 Announce Type: replace Abstract: Test-time compute is widely believed to benefit only large reasoning models, leaving small models with nothing to gain. We argue the opposite for dense retrieval, since modern small embedding models are distilled or adapted from large language model backbones and can inherit their latent test-time-compute potential. We ask how much retrieval quality a frozen embedding model gains at inference alone, with no auxiliary model and no parameters...

arXiv CS 8d ago