Home Knowledge Base Claude Opus 3

Claude Opus 3

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

For reasons that will remain hidden, we resume writing about Generative AI/LLM after a hiatus of 15 months (that one from October 2025, and the one from June 2025, don’t really count as serious pieces). Today, the first of two articles about “coding with Large ‘Language’ Models”, as coding with LLMs is positioned as the ‘killer app‘ for LLMs. We interrupt this program for a short digression on Anthropic’s recently released blog post When AI builds itself.

Hacker News 3d ago

Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation

arXiv:2605.04135v2 Announce Type: replace Abstract: Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-3.5 or GPT-4 zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse...

arXiv CS 5d ago

When AI Builds Itself: Our progress toward recursive self-improvement

For most of AI’s history, humans drove every step in its development cycle. But at Anthropic, we are delegating a growing share of AI development to AI systems themselves, which is speeding up our work. Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor.

Hacker News 6d ago

Claude Fable 5

Claude Fable 5 and Claude Mythos 5 Today we’re launching Claude Fable 5: a Mythos-class1 model that we’ve made safe for general use. Fable 5’s capabilities exceed those of any model we’ve ever made generally available.

Hacker News 1d ago

LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios

Announce Type: replace Abstract: LLM bias evaluation is critical as large language models (LLMs) increasingly influence high-stakes decisions. This paper provides a comprehensive assessment of gender, racial, and age disparities in leading LLMs, revealing that debiasing efforts often create new fairness trade-offs. Recent advancements in LLMs have been notable, yet widespread enterprise adoption remains limited due to various constraints.

arXiv CS 9d ago

'It would be good for the world' to slow down AI sprints, Anthropic says

It would be “good for the world” to slow down the pace of AI development, according to a blog post from Anthropic, which this week began the process of going public with a confidential IPO filing. “We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology,” stated a blog post written by Anthropic co-founder (and former Reg scribe) Jack...

The Register 5d ago

Alibaba/Open-Code-Review

The open source AI code review agent. English | 简体中文 Open Code Review is an AI-powered code review CLI tool. It originated as Alibaba Group's internal official AI code review assistant — over the past two years, it has served tens of thousands of developers and identified millions of code defects.

Hacker News 5d ago

FrontierCode

Introducing FrontierCode Raising the bar from correctness to quality Today’s coding benchmarks have established that models can write correct code. But as AI-generated code becomes the dominant path to production, correctness is now table stakes. The question that we should be asking is: can models actually write good code?

Hacker News 1d ago

Launch HN: Expanse (YC P26) – Unlock Wasted GPU Capacity

Hey HN, we’re Ismaeel, Eren, Yafet and Nikodem. We built Expanse (https://expanse.sh/) to increase the effective capacity of your HPC/GPU clusters running schedulers/orchestrators like Kubernetes and SLURM. We read the source code, job submission script, and the hardware a workload is about to run on to predict what the job actually needs before the cluster sees it.

Hacker News 9d ago

"AI Psychosis" in Context: How Conversation History Shapes LLM Responses to Delusional Beliefs

arXiv:2604.13860v4 Announce Type: replace Abstract: Extended interaction with large language models (LLMs) has been linked to the reinforcement of delusional beliefs, attracting clinical and public concern. Yet most empirical work evaluates model safety in brief interactions, which may not reflect how harms develop through sustained dialogue. Five LLMs were tested across three levels of accumulated context, using the same escalating delusional conversation history to isolate its effect on...

arXiv CS 8d ago