Home Knowledge Base Claude Opus 4.1

Claude Opus 4.1

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

For reasons that will remain hidden, we resume writing about Generative AI/LLM after a hiatus of 15 months (that one from October 2025, and the one from June 2025, don’t really count as serious pieces). Today, the first of two articles about “coding with Large ‘Language’ Models”, as coding with LLMs is positioned as the ‘killer app‘ for LLMs. We interrupt this program for a short digression on Anthropic’s recently released blog post When AI builds itself.

Hacker News 3d ago

Alibaba/Open-Code-Review

The open source AI code review agent. English | 简体中文 Open Code Review is an AI-powered code review CLI tool. It originated as Alibaba Group's internal official AI code review assistant — over the past two years, it has served tens of thousands of developers and identified millions of code defects.

Hacker News 5d ago

When AI Builds Itself: Our progress toward recursive self-improvement

For most of AI’s history, humans drove every step in its development cycle. But at Anthropic, we are delegating a growing share of AI development to AI systems themselves, which is speeding up our work. Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor.

Hacker News 6d ago

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it As a part of my work I do security research for various apps and websites. I wanted to see if LLMs could reproduce a common class of exploits I’ve found in multiple apps. I made a fake React Native app in Expo and a backend in Python.

Hacker News 6d ago

Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams

Announce Type: new Abstract: Engineering diagrams pose a distinct challenge for vision-language models: unlike natural images or general documents, they encode information through dense spatial layouts, domain-specific symbols, and cross-references between visual callouts and structured parts tables. Despite their centrality to service, repair, and design workflows, there is no public benchmark for measuring VLM capabilities in this domain; existing datasets primarily focus on flowcharts,...

arXiv CS 7d ago

Microsoft and Google are late to AI coding, but 'absolutely critical' they compete for growth

In the booming generative AI market, Anthropic has zoomed ahead of the field, largely thanks to Claude Code, its AI coding assistant. Seeing where the money is, OpenAI shifted much of its focus from the consumer market to enterprise, where its Codex offering is going up against Claude Code. Now, Google and Microsoft are making a concerted effort to get into the game, using their massive balance sheets and expansive cloud businesses to try and lure developers.

CNBC 8d ago

The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection

Announce Type: new Abstract: We present a reproducible failure mode of safety training in RAG-based LLM recommendation -- the Injection Paradox -- in which prompt injections embedded in retrieved documents backfire against the attacker, suppressing the target brand below the injection-free baseline. In safety-trained Claude models, documents containing prompt injections suffer a sharp drop in recommendation rate, and this suppression propagates beyond the injected document to unmodified...

arXiv CS 1d ago

FrontierCode

Introducing FrontierCode Raising the bar from correctness to quality Today’s coding benchmarks have established that models can write correct code. But as AI-generated code becomes the dominant path to production, correctness is now table stakes. The question that we should be asking is: can models actually write good code?

Hacker News 1d ago

Do LLMs Hold Their Values? MANTA: A Multi-Turn Adversarial Benchmark for Animal Welfare Reasoning

arXiv:2605.16301v2 Announce Type: replace Abstract: Evaluating animal welfare reasoning in LLMs remains an open challenge despite rapid deployment in consumer and professional contexts where welfare considerations appear implicitly in everyday queries. Existing benchmarks such as AnimalHarmBench evaluate this through single-turn, explicitly framed questions, measuring whether models avoid harmful content when directly asked. This approach overlooks two failure modes: alignment degradation...

arXiv CS 6d ago

Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

Computer Science > Machine Learning [Submitted on 25 Mar 2026 (v1), last revised 17 Apr 2026 (this version, v5)] Title:Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

Hacker News 1d ago