Home › Knowledge Base › O3

O3

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

ACTIVE-o3: Empowering MLLMs with Active Perception via Pure Reinforcement Learning

arXiv:2505.21457v2 Announce Type: replace Abstract: Active vision, also known as active perception, refers to actively selecting where and how to look in order to gather task-relevant information. It is a critical component of efficient perception and decision-making in humans and advanced embodied agents. With the rise of Multimodal Large Language Models (MLLMs) as central planners in robotic systems, the lack of methods for equipping MLLMs with active perception has become a key gap.

arXiv CS 1d ago

VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation

arXiv:2606.06819v1 Announce Type: new Abstract: Reasoning Video Object Segmentation (RVOS) demands a sophisticated integration of temporal dynamics, spatial details, and linguistic reasoning to achieve precise pixel-level localization. Existing methods are limited to reasoning over fixed initial inputs and lack the capacity to actively acquire further visual evidence, which is often essential for resolving complex references in long or intricate videos. To address this, we propose...

arXiv CS 2d ago

Deep learning reveals a stronger fossil fuel influence than biomass burning in shaping remote tropospheric ozone

arXiv:2606.09793v1 Announce Type: new Abstract: Tropospheric ozone (O3) is a key greenhouse gas and atmospheric oxidant, yet its sources in the remote troposphere remain strongly debated. Observation-based tracer analyses suggest that O3 attributed to biomass burning is much greater than that from fossil fuel sources (by a factor of ~2-10), contradicting state-of-the-art global models. Here we show that this discrepancy primarily arises from the strong sensitivity of tracer methods to...

arXiv Physics 1d ago

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps

arXiv:2605.17554v2 Announce Type: replace Abstract: Frontier deep research agents (DRAs) plan a research task, synthesize across documents, and return a structured deliverable on demand. They are being deployed in enterprise workflows faster than they are being evaluated. Existing benchmarks measure factual recall, single-hop QA, or generic agentic skill, missing the multi-document, decision-grade work DRAs are deployed to produce.

arXiv CS 8d ago

More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration

Announce Type: replace Abstract: Large language model (LLM) agents increasingly coordinate in multi-agent systems, yet we lack an understanding of where and why cooperation fails. Many real-world coordination problems are not social dilemmas: helping others -- sharing documentation, unblocking a teammate -- costs the helper almost nothing while producing substantial collective benefit. Whether LLM agents cooperate in this regime, where helping is free and they are explicitly instructed to do...

arXiv CS 2d ago

A new open-shell CCSDTQ implementation and its application to the basis set convergence of post-CCSDT(Q) corrections in computational thermochemistry

arXiv:2605.19860v2 Announce Type: replace Abstract: We extend the CCSDTQ implementation in CFOUR to UHF and ROHF references and demonstrate its efficiency. We apply it to basis set convergence of post-CCSDT(Q) corrections for the W4-08 thermochemical dataset. Convergence of (Q)$_\Lambda$--(Q) is relatively rapid.

arXiv Physics 7d ago

Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese

arXiv:2606.07853v1 Announce Type: new Abstract: Large Language Models are transforming the support for clinical decision and their application in real scenarios. Yet, most benchmarks are conducted in English, and cross-lingual evaluation is needed to tackle the language gaps in global access. We introduce ClinicalBr, the first bilingual benchmark for clinical decision built from real Brazilian case reports.

arXiv CS 1d ago

Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

arXiv:2601.06600v4 Announce Type: replace Abstract: Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (MLLMs) have demonstrated impressive reasoning capabilities, their robustness against misinformation entangled with cognitive biases remains under-explored. In this paper, we introduce a comprehensive evaluation framework using a high-quality, manually...

arXiv CS 2d ago

Wildfire smoke has reversed US progress toward ozone air quality, study finds

Since 2015, fires have undone years of effort to reduce ozone levels, underscoring a growing public health crisisThe highly destructive wildfires that have battered the US and North America in recent years have significantly increased emissions and been linked to tens of thousands of premature deaths, but their impact on air quality is greater than previously known, according to new research. A study published in Science on Thursday found that, since 2015, wildfires have reversed US progress...

The Guardian UK 6d ago

Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

arXiv:2606.07157v1 Announce Type: new Abstract: Many efforts to ensure frontier AI models are safe rely on monitoring their chain-of-thought (CoT) reasoning. If models become able to perform sufficiently complex reasoning internally, without explicit thinking tokens, this would undermine such oversight. We measure how well frontier models reason without CoT across a suite of over 30,000 questions spanning 43 benchmarks in domains including math, coding, puzzles, causality, theory-of-mind,...

arXiv CS 2d ago