the Advantage Collapse Rate
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation
arXiv:2605.21125v2 Announce Type: replace Abstract: Group Relative Policy Optimization (GRPO), a prominent algorithm within the Reinforcement Learning from Verifiable Rewards (RLVR) framework, has achieved strong results in improving the reasoning capabilities of large language models (LLMs). However, GRPO is prone to advantage collapse, a failure mode where homogeneous rewards within a group (e.g., all correct or all incorrect answers) yield near-zero advantages and vanishing gradients. To...
Float8@2bits: Entropy Coding Enables Data-Free Model Compression
arXiv:2601.22787v2 Announce Type: replace Abstract: Post-training compression is currently divided into two contrasting regimes. On the one hand, fast, data-free, and model-agnostic methods (e.g., NF4 or HQQ) offer maximum accessibility but suffer from functional collapse at extreme bit-rates below 4 bits. On the other hand, techniques leveraging calibration data or extensive recovery training achieve superior fidelity but impose high computational constraints and face uncertain robustness...
Can You Stop a Hypersonic Missile?
Can You Stop a Hypersonic Missile? The headlines say yes. Patriot crews shot down a Kinzhal over Kyiv on the night of May 4, 2023.
How Spain Avoided the Global Populist Backlash
As recently as five years ago, Spain was no one’s idea of an economic success story. Southern European countries have long been notorious for lagging behind their neighbors to the north. Portugal, Italy, Greece, and Spain were referred to by the intentionally unflattering nickname “PIGS” after they had to be bailed out following the 2008 financial crisis.
China unveils first-of-its-kind 'dual-core' quantum computer — its makers say it improves stability and efficiency
China unveils first-of-its-kind 'dual-core' quantum computer — its makers say it improves stability and efficiency A new Chinese quantum computing system pairs two independent neutral-atom arrays in one processor, aiming to boost stability, efficiency and scalability. A Chinese company has unveiled what its researchers are calling the world’s first "dual-core" quantum computer. It's a neutral-atom system designed to improve stability, efficiency and error correction by pairing two...
Domain expertise has always been the real moat
Domain Expertise Has Always Been the Real Moat The hard part of writing software has never been the writing. It was building a working model of the domain in your head first. Before you could ship a payroll system you had to understand garnishments and pre-tax deductions and what happens when someone’s pay period straddles a rate change.
How back is the U? Can Clemson rebound? Previewing...
Life in the ACC certainly isn't boring. In the past year alone, the conference has produced a long and awkward CFP rankings battle, an irate affiliate member, a thrilling national title game run, the strangest tiebreaker result imaginable, an out-of-nowhere 11-win season, the most disappointing team in the country, an epic pro-to-college face-plant, 18 of the 38 best games of the 2025 season, the No. 1 pick in the NFL draft (indirectly) and the most awkward possible move to nine-game...
Crystal Nights by Greg Egan
Publication history - Interzone #215, April 2008. - Free podcast at Transmissions From Beyond. [Site no longer active] - Oceanic (collection, Orion) -
The Tech Download: Anthropic’s IPO sets up first big test of AI boom valuations
This report is from this week's The Tech Download newsletter. You can subscribe here. This week has been dominated by the hype around the highly anticipated IPOs of SpaceX, Anthropic and OpenAI.
Ev-Trust: An Evolutionarily Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies
arXiv:2512.16167v3 Announce Type: replace Abstract: Decentralized LLM-based multi-agent service economies face three vulnerabilities that undermine traditional trust mechanisms: reduced cost of fraud, difficulty in evaluating service quality, and instability of service content. These compounding vulnerabilities can trigger population-level trust collapse and the proliferation of short-sighted strategies. We propose Ev-Trust, an evolutionarily stable trust mechanism that addresses these...