Claude Pro
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
From Outliers to Errors: Auditing Pali-to-English LLM Translations with Multi-Reference Adjudication
arXiv:2606.01136v1 Announce Type: new Abstract: Single-score translation metrics can conflate legitimate variation with error, a problem especially acute for classical languages where multiple defensible English renderings of the same passage coexist. We audit Pali-to-English output from four flagship large language models (LLMs): GPT-5.5, Claude Sonnet 4.6, Gemini 3.1 Pro, and Grok 4.3, on 1,700 passages from the Pali Canon, using three established human translations by Bhikkhu Sujato,...
Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation
arXiv:2605.04135v2 Announce Type: replace Abstract: Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-3.5 or GPT-4 zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse...
Scaffold Effects on GAIA: A Controlled Comparison
Announce Type: new Abstract: Published agent capability scores conflate what a model can do with what its scaffold lets it do, and the magnitude of this elicitation gap is not well characterized under controlled conditions. This study executes a pre-registered controlled comparison of three scaffolds (ReAct, a Planner-Actor-Rater multi-agent design, and planner-then-executor) across five models from three providers (Claude Opus 4.7, Sonnet 4.6, Haiku 4.5; Gemini 3.1 Pro Preview; GPT-5.5) on...
Evaluation of LLMs for Mathematical Formalization in Lean
arXiv:2606.05632v1 Announce Type: new Abstract: Within the past few years, the ability of Large Language Models (LLMs) to generate formal mathematical proofs has improved drastically. We provide a comparison of various LLMs' effectiveness in producing formal proofs in Lean 4 with the goal of assisting those seeking to use LLMs to support their own projects. We utilize both pass@$k$ and refine@$k$ metrics as the benchmark for our comparison and evaluate on subsets of both miniF2F and miniCTX...
Memory and personalization make AI more likely to tell you what you want to hear
AI companies have touted context retention (memory) and the availability of personal details (personalization) as mechanisms for improving AI model interaction. Both have value to help keep models from losing the thread of a conversation. But they raise the potential for sycophancy, where models will say what they predict you want to hear, which may not be the most accurate response.
OpenAI mulls slashing prices as it competes with Anthropic for users
OpenAI is mulling sharp price cuts to its artificial intelligence offerings, as it looks to woo consumers away from rival Anthropic, the Wall Street Journal reported Wednesday evening stateside, citing sources familiar with the matter. "The company is weighing significant cuts to what it charges for tokens, the unit of measurement artificial-intelligence firms use to bill for their products," the report said, adding that it was "in anticipation of similar cuts the company expects at...
OpenAI mulls slashing prices as it competes with Anthropic for users: WSJ
OpenAI is mulling sharp price cuts to its artificial intelligence offerings, as it looks to woo consumers away from rival Anthropic, the Wall Street Journal reported Wednesday evening stateside, citing sources familiar with the matter. "The company is weighing significant cuts to what it charges for tokens, the unit of measurement artificial-intelligence firms use to bill for their products," the report said, adding that it was "in anticipation of similar cuts the company expects at...
Claude Desktop spins up a VM without no way of stopping it
- Notifications You must be signed in to change notification settings - Fork 21.3k [BUG] Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use #29045 Description Preflight Checklist - I have searched existing issues and this hasn't been reported yet - This is a single bug report (please file separate reports for different bugs) - I am using the latest version of Claude Code [BUG] Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use...
Anthropic spins a Fable of a tamer, safer Mythos
Anthropic's Mythos model, supposedly too dangerous for public release in April, is now available to wreak havoc or tackle other tasks for a hefty price and with some new guardrails in place. Just make sure you don't mind having Anthropic keep some of your data for a while. The AI biz on Tuesday announced public availability of Claude Fable 5, a Mythos-class model, and private availability of Claude Mythos 5 for Glasswing partners.
Anchoring LLM Gender Bias to Human Baselines: A Cross-Lingual Audit
arXiv:2605.30804v1 Announce Type: new Abstract: We audit six large language models (LLMs) for gender stereotyping across English, Korean, Chinese, and Japanese. Three were developed primarily for English-language use (Claude, GPT, Gemini) and three for East Asian use (DeepSeek, Syn-Pro, HyperCLOVA X). We adopt the HEXACO-100 personality inventory and anchor each model against a cross-cultural human dataset spanning 48 countries to ask not whether LLMs are biased, but how far their gender...