Claude Opus
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Zot now supports Claude Opus 4.8
The Zot platform has announced that it now supports the Claude Opus
Claude Opus 4.8
Please provide the text of the article titled
LLM Consortium for Software Design Refinement: A Controlled Experiment on Multi-Agent Collaboration Topologies
Announce Type: new Abstract: We present a controlled experiment evaluating 12 multi-agent LLM collaboration topologies for software architecture design. Using a $2\times2\times2$ factorial design (Authority $\times$ Roles $\times$ Dynamics), we conducted 520 experimental runs across 8 design tasks of varying complexity, with 5 repetitions each. Designs were evaluated on a 12-dimensional rubric by three independent automated evaluators (GPT-OSS 120B, Claude Opus 4.6, Claude Sonnet 4.6).
Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them
For reasons that will remain hidden, we resume writing about Generative AI/LLM after a hiatus of 15 months (that one from October 2025, and the one from June 2025, don’t really count as serious pieces). Today, the first of two articles about “coding with Large ‘Language’ Models”, as coding with LLMs is positioned as the ‘killer app‘ for LLMs. We interrupt this program for a short digression on Anthropic’s recently released blog post When AI builds itself.
Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation
arXiv:2605.04135v2 Announce Type: replace Abstract: Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-3.5 or GPT-4 zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse...
Claude’s new model is more ‘honest’ when it messes up
Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model's "honesty." According to Anthropic, it trains "all [its] models to be honest - for instance, to avoid making claims that they can't support." But it notes that "a general problem with AI models is that they sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence." The AI lab claims that early testers have found that Opus 4.8 "is more likely to flag...
AI agents actively ignore EU law to achieve goals, study finds
The best-performing AI agent, Anthropic’s Claude Opus, only complied with EU law in 54% of cases, according to a Dutch non-profit research firm. Some of the world's most popular AI models are building agents that actively resist EU regulation to get what they want, according to new research. Aithos, a Dutch non-profit researching AI alignment, developed a system called LARA to test 12 popular AI agent models to see whether they would follow key parts of the EU AI Act, which regulates how AI...
Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up
Anthropic’s updated model, Claude Opus 4.8, is particularly adept at vibecoding, or the process of artificial intelligence writing code from prompts in conversational English.
Scaffold Effects on GAIA: A Controlled Comparison
Announce Type: new Abstract: Published agent capability scores conflate what a model can do with what its scaffold lets it do, and the magnitude of this elicitation gap is not well characterized under controlled conditions. This study executes a pre-registered controlled comparison of three scaffolds (ReAct, a Planner-Actor-Rater multi-agent design, and planner-then-executor) across five models from three providers (Claude Opus 4.7, Sonnet 4.6, Haiku 4.5; Gemini 3.1 Pro Preview; GPT-5.5) on...
Claude Fable 5
Claude Fable 5 and Claude Mythos 5 Today we’re launching Claude Fable 5: a Mythos-class1 model that we’ve made safe for general use. Fable 5’s capabilities exceed those of any model we’ve ever made generally available.