Perceiver
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Traditional, patriarchal Japanese terms for husband and wife may now be perceived as neutral
Traditional, patriarchal Japanese terms for husband and wife may now be perceived as neutral Sadie Harley Scientific Editor Robert Egan Associate Editor A new study suggests that, for modern Japanese speakers, two traditional, patriarchal words for "husband" ("shujin," literally meaning "master") and "wife" ("kanai," "inside-the-house") may be losing their original meanings, though men in the study evaluated both traditional and neutral words for "husband" more positively than words for...
Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles
Announce Type: new Abstract: We ask whether topic sentiment has a causal effect on perceived political ideology, and whether the answer depends on who assigns the ideology label. Using articles from AllSides, paired with shared sentiment annotations from Llama-3.3-70b-versatile, we compare ideology labels from expert human annotators, GPT-4o-mini (baseline and finetuned), and Llama-3.3-70B. We apply Double Machine Learning (DML) and community-level mediation analysis across all four...
X-GS: An Extensible Framework for Perceiving and Thinking via 3D Gaussian Splatting
arXiv:2603.09632v4 Announce Type: replace Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis, subsequently extending into numerous spatial AI applications. However, most existing 3DGS methods operate in isolation, focusing on specific domains. In this paper, we introduce X-GS, an extensible framework consisting of two major components.
Do speech foundation models perceive speaker similarity as humans do?
Announce Type: replace Abstract: This study presents a comparative analysis between the speaker embeddings of speech foundation models and human subjective perception of speaker similarity. Human listeners have the ability to judge speaker similarity on a continuous scale discerning how similar two voices are. In contrast, speech foundation models embed speaker characteristics into numerical representation.
Do speech foundation models perceive speaker similarity as humans do?
Announce Type: new Abstract: This study presents a comparative analysis between the speaker embeddings of speech foundation models and human subjective perception of speaker similarity. Human listeners have the ability to judge speaker similarity on a continuous scale discerning how similar two voices are. In contrast, speech foundation models embed speaker characteristics into numerical representation.
Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents
arXiv:2606.03236v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced mobile agents, yet proactive mobile assistance remains challenging because agents must decide \emph{when} to intervene before determining \emph{how} to assist. Existing systems often implement these two decisions within a unified MLLM-based pipeline, leading to goal misalignment between conservative intervention filtering and comprehensive assistance generation, as well as...
The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?
Announce Type: replace Abstract: Visual speech recognition (VSR) models now surpass human lipreaders on benchmarks, but do such gains establish human-like visual speech perception? To explore this, we compare three VSR systems with human baselines on the MaFI word-level lipreading dataset using word, character, phoneme, and viseme-level metrics. Although models achieve higher overall accuracy, they succeed and fail on different words than humans.
Learning to Perceive the World Through Control: Empowerment-Based Representation Learning
arXiv:2605.30656v1 Announce Type: new Abstract: In many practical reinforcement learning environments, observations are far higher-dimensional than the variables that matter for control. In this work, we ask: can we learn representations that capture only control-relevant features of the environment? We study this question through the empowerment objective, which maximizes an agent's influence over the environment and is widely used for unsupervised skill learning.
The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?
arXiv:2606.07435v1 Announce Type: new Abstract: Visual speech recognition (VSR) models now surpass human lipreaders on benchmarks, but do such gains establish human-like visual speech perception? To explore this, we compare three VSR systems with human baselines on the MaFI word-level lipreading dataset using word, character, phoneme, and viseme-level metrics. Although models achieve higher overall accuracy, they succeed and fail on different words than humans.
Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?
Announce Type: new Abstract: Reasoning Vision-Language Models (VLMs) achieve strong performance on complex multimodal tasks, but reliable real-world application requires handling visual inputs that are messier than clean, curated benchmarks. Existing works mainly evaluate such reliability of VLMs through input corruptions, such as noise, blur and weather effects, which make visual evidence harder to perceive. This leaves a critical reliability failure mode underexplored: a model may perceive...