Elo
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
ViKing of Norway: Pragg wins crown that eluded even Vishy
Vishy Anand has been there and done that. The first Indian chessman to build an incredible résumé with his global achievements, he saw D Gukesh (Candidates and World title), Arjun Erigaisi (breaking into the Elo 2800 club) and K Humpy (World Rapid crown) follow in his footsteps. But late on Friday in Oslo, R Praggnanandhaa scaled a peak that even Anand could not conquer in his several forays at Norway Chess.
ChessMimic: Per-Rating Transformer Models for Human Move, Clock, and Outcome Prediction in Online Blitz Chess
arXiv:2606.04473v1 Announce Type: new Abstract: We present ChessMimic, a system of three small encoder-only transformers - for move, thinking-time, and outcome prediction - conditioned on the position, recent move history, player rating, and clock state. We fit a separate instance of each model per 100-Elo rating band, trading parameter efficiency for sharper per-skill calibration. On a held-out month-wide slice of Lichess Rated Blitz games ChessMimic's human move prediction accuracy...
Correct Looks Better: Pairwise Comparisons Reveal Accuracy Rankings
arXiv:2606.09409v1 Announce Type: new Abstract: Pairwise comparisons combined with aggregation methods like Elo have become central to evaluating generative models, yet concerns remain that they reward superficial stylistic cues or display judge biases. In a more positive turn, we show that model rankings from pairwise comparisons strongly agree with ground-truth-based accuracy rankings when such ground truth is available for comparison. By converting five well-known benchmarks into...
Predicting every game of the entire World Cup: All...
Everyone is using artificial intelligence to do, well, everything. With the World Cup starting on June 11, you can't scroll for more than a couple of minutes without hitting another post or video or reel of someone telling you how they used AI to predict the World Cup. So, I decided to use my own supercomputer to predict every game of the 2026 World Cup -- the supercomputer is called "my brain."
Ranked: The final 48 World Cup rosters are in! Whi...
Finally, we have arrived. The World Cup starts in a little over a week, and every team has finalized its 26-man squad. We know every country that will be participating in the World Cup, and we also know -- barring last-minute injuries -- every player who will be participating in the World Cup.
Why isn't the U.S. better at soccer?
Why isn't the U.S. better at soccer? Well, better at men's soccer. Can a World Cup at home finally be the breakthrough for the USMNT?
From Player to Master: Enhancing Test-Time Learning of LLM Agents via Reinforcement Learning over Memory
Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed in long-running settings where improving through experience at test time becomes important. A common approach is to update an explicit memory after each interaction to guide future decisions. However, most existing methods rely on hand-designed prompting rules, making it difficult to align memory updates with downstream objectives over multi-step horizons consistently.
Variational Proximal Policy Optimization
Announce Type: cross Abstract: Reinforcement Learning from Human Feedback via Proximal Policy Optimization often suffers from policy mode collapse, brittle exploration loops, and distribution drift. This paper introduces Variational Proximal Policy Optimization (\(\textsc{VP}_2\textsc{O}\)), a particle-based variational inference framework that maps policy optimization to Stein Variational Gradient Descent within a Mixture-of-Experts architecture. By leveraging functional kernels over...
Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation
arXiv:2605.04135v2 Announce Type: replace Abstract: Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-3.5 or GPT-4 zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse...
Future Power Rankings: How all 68 Power 4 college football teams stack up
Projecting a college football program's future is harder than ever. Rosters and fortunes change dramatically and championship pathways are more open than ever. The assets that make a program great in 2026 might not be there in 2027.