Home Knowledge Base Frontier Models

Frontier Models

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

arXiv:2606.07157v1 Announce Type: new Abstract: Many efforts to ensure frontier AI models are safe rely on monitoring their chain-of-thought (CoT) reasoning. If models become able to perform sufficiently complex reasoning internally, without explicit thinking tokens, this would undermine such oversight. We measure how well frontier models reason without CoT across a suite of over 30,000 questions spanning 43 benchmarks in domains including math, coding, puzzles, causality, theory-of-mind,...

arXiv CS 2d ago

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

arXiv:2606.05080v1 Announce Type: new Abstract: Scientific and engineering progress is fundamentally a long-horizon iterative process: proposing changes, running experiments, measuring outcomes, and continuously refining artifacts. Yet existing benchmarks for frontier models primarily evaluate either single-turn responses or short-horizon agent trajectories, failing to capture the challenges of sustained iterative improvement over extended time horizons. To address this gap, we introduce...

arXiv CS 6d ago

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

arXiv:2606.03130v1 Announce Type: new Abstract: Small open-source code models that power IDE autocomplete still emit hallucinated Fill-in-the-Middle (FIM) completions: syntactically natural calls to methods, parameters, variables, and imports that do not exist in the surrounding project. Existing mitigations either require per-language execution sandboxes that do not apply at mid-keystroke or preference-optimisation pipelines that need large human-labelled corpora.

arXiv CS 7d ago

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

Hey HN, Guanming and Bill here from General Instinct (https://general-instinct.com/).After years of working in robotics, we kept running into the same problem: the best models never fit the hardware we actually had available. The models that performed best were usually designed around datacenter assumptions: large GPUs, lots of memory bandwidth, and reliable network access. But most physical systems have the opposite constraints.

Hacker News 5d ago

Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends

Announce Type: new Abstract: With rapid development of large language models and diffusion-based content generation, world modeling has attracted increasing research attention, benefiting various downstream domains such as game engines, embodied AI, autonomous driving, etc. Through explicitly incorporating user actions into world state transition, recent literature empowers world modeling with interactivity in an action-conditioned video or 3D generation paradigm, further enhancing...

arXiv CS 8d ago

OpenAI frontier models and Codex are now available on AWS

https://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/ Comments URL: https://news.ycombinator.com/item?id=48363132 Points: 13 # Comments: 1

Hacker News 8d ago

Limits of Spatial Imagery Reasoning in Frontier LLM Models

Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, yet they struggle with spatial tasks that require mental simulation, such as mental rotation. This paper investigates whether equipping an LLM with an external ``Imagery Module'' -- a tool capable of rendering and rotating 3D models -- can bridge this gap, functioning as a ``cognitive prosthetic.'' We conducted experiments using a dual-module architecture in which a reasoning...

arXiv CS 8d ago

The Bank, FCA and HM Treasury joint statement on Frontier AI models and cyber resilience

Statement from the Bank of England, Financial Conduct Authority and HM Treasury

Bank of England News 10d ago

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

Announce Type: replace-cross Abstract: Every generative model for crystalline materials harbors a critical structure size beyond which its outputs become unreliable; we call this the extrapolation frontier. Despite its consequences for nanomaterial design, this frontier has never been systematically measured. We introduce RADII, a radius-resolved benchmark of ~75,000 crystal-derived nanoparticle structures (33-11,298 atoms) that treats radius as a continuous scaling knob, tracing generation...

arXiv Physics 9d ago

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

Announce Type: replace-cross Abstract: Every generative model for crystalline materials harbors a critical structure size beyond which its outputs become unreliable; we call this the extrapolation frontier. Despite its consequences for nanomaterial design, this frontier has never been systematically measured. We introduce RADII, a radius-resolved benchmark of ~75,000 crystal-derived nanoparticle structures (33-11,298 atoms) that treats radius as a continuous scaling knob, tracing generation...

arXiv CS 9d ago