Home › Knowledge Base › SOTAs

SOTAs

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

Announce Type: replace Abstract: Vision-Language Models (VLMs) are increasingly deployed as the perception and reasoning backbone of autonomous agents acting in the wild, with autonomous driving (AD) being one of the most safety-critical instances. Reliable temporal understanding is essential for such agents to anticipate events, attribute causes, and act safely in dynamic environments, yet this remains a significant challenge even for state-of-the-art (SoTA) VLMs. Prior video benchmarks...

arXiv CS 6d ago

From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

arXiv CS 8d ago

Position: State-of-the-Art Claims Require State-of-the-Art Evidence

arXiv:2605.17273v3 Announce Type: replace Abstract: State-of-the-Art (SOTA) claims pervade Artificial Intelligence (AI) and Machine Learning (ML) research. These claims rest on benchmark evaluations, where models are ranked by aggregate scores across tasks. Public benchmarks or leaderboards are the most visible instance, but the same structure appears in paper tables throughout the literature.

arXiv CS 6d ago

Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization

arXiv:2605.31508v1 Announce Type: new Abstract: Video Object-Centric Learning (OCL) aims to represent objects as \textit{slot} vectors and maintain their consistency across frames. Slot-Slot Contrastive (SSC) loss has become the cornerstone for state-of-the-art (SOTA) video OCL methods. While highly effective, SSC relies on one-to-one object correspondence across frames and introduces an extra loss.

arXiv CS 9d ago

On the Expressive Power of Permutation-Equivariant Weight-Space Networks

arXiv:2602.01083v2 Announce Type: replace Abstract: Weight-space learning studies neural architectures that operate directly on the parameters of other neural networks. Motivated by the growing availability of pretrained models, recent work has demonstrated the effectiveness of weight-space networks across a wide range of tasks. SOTA weight-space networks rely on permutation-equivariant designs to improve generalization.

arXiv CS 6d ago

End-to-End Training for Discrete Token LLM based TTS System

new Abstract: Recent state-of-the-art (SOTA) text-to-speech (TTS) systems typically adopt a cascaded pipeline consisting of a speech tokenizer, an autoregressive large language model (LLM), and a diffusion based flow-matching (FM) model, with these components trained independently. In this paper, we propose a fully end-to-end (E2E) optimization framework that unifies the training of the speech tokenizer, LLM, FM model, and an additional reward model (RM). Specifically, we first jointly...

arXiv CS 1d ago

Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models

Announce Type: replace-cross Abstract: This study investigates the pre-trained RNN attention models with the mainstream attention mechanisms, such as additive attention, Luong's three attentions, global self-attention and sliding window sparse attention, for the empirical asset pricing research on the top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the...

arXiv CS 5d ago

Diagnosis of Human Object Interaction Detectors for Real World Educational Applications

arXiv:2606.02789v1 Announce Type: new Abstract: Human-object interaction (HOI) recognition is critical for automatically analyzing student behavior in complex educational environments. Although state-of-the-art (SOTA) HOI detectors perform well on benchmark datasets, their performance often degrades when deployed in real-world training environments due to domain-specific objects, occlusions, and complex visual conditions. In this paper, we introduce a diagnosis-driven framework that...

arXiv CS 7d ago

ChannelTok: Efficient Flexible-Length Vision Tokenization

Announce Type: new Abstract: Leading flexible vision tokenizers achieve SOTA quality at an extreme cost, relying on parameter-heavy backbones and slow, multi-step generative decoders. We depart from this complex, spatial-token paradigm and introduce a simple, lightweight, and fast channel-wise flexible-length tokenizer. Our method treats each latent channel as a visual token, enabling a parameter-efficient CNN-Transformer hybrid backbone.

arXiv CS 6d ago

U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster

Announce Type: replace Abstract: AI-based weather forecasting now rivals traditional physics-based ensembles, but state-of-the-art (SOTA) models rely on specialized architectures and massive computational budgets, creating a high barrier to entry. We demonstrate that such complexity is unnecessary for frontier performance. We introduce \ours, a probabilistic forecaster built on a standard U-Net backbone trained with a simple recipe: deterministic pre-training on Mean Absolute Error followed...

arXiv CS 8d ago