Item Response Theory
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Auditing LLM Benchmarks with Item Response Theory
Announce Type: new Abstract: LLM benchmark labels are frozen at release and silently propagated into downstream benchmarks, errors and all. We introduce an Item Response Theory-based indicator that surfaces likely mislabels at 95% precision in the top 200 examples across seven preference and multiple-choice benchmarks using responses from 114 models, outperforming a supervised classifier. We trace these errors to mechanical labeling heuristics, upstream annotation mistakes inherited...
Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning
arXiv:2009.10277v2 Announce Type: replace Abstract: We propose a system for measuring hate speech on a continuous, interval-valued spectrum ranging from genocidal to supportive speech by combining supervised deep learning with faceted Rasch item response theory (IRT). We decompose the theoretical construct of hate speech into constituent concepts operationalized as 10 ordinal labels. Those labels are reconstituted via IRT probabilistic latent modeling into an interval outcome measure while...
Diagnosing the Reliability of LLM-as-a-Judge via Item Response Theory
arXiv:2602.00521v2 Announce Type: replace Abstract: While LLM-as-a-Judge is widely used in automated evaluation, existing validation practices primarily operate at the level of observed outputs, offering limited insight into whether LLM judges themselves function as stable and reliable measurement instruments. To address this limitation, we introduce a two-phase diagnostic framework for assessing reliability of LLM-as-a-Judge, grounded in Item Response Theory (IRT). The framework adopts...
The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs
Announce Type: new Abstract: Large language models are increasingly used to answer culturally grounded questions across languages, yet it remains unclear whether local cultural knowledge is better accessed through English or the local language. Existing evaluations face two key limitations: many rely on parallel template-based questions that may not reflect how cultural knowledge naturally appears, and raw accuracy conflates general language proficiency with language-conditioned knowledge...
Discovering Misconceptions and Misunderstandings From Administrations of Research-Designed Multiple Choice Instruments
arXiv:2606.08986v1 Announce Type: new Abstract: Misconceptions are "alternate hypotheses" that are incorrect according to established theories of how the world works. Often held with confidence by students, they are relatively context-insensitive, can seem like common-sense views, and are noted for being resistant to remediation using traditional instruction. To find misconceptions in Newtonian mechanics, we analyze ~34,000 administrations of the pioneering Force Concept Inventory using a...
How I Get Free Traffic from ChatGPT in 2025 (AIO vs SEO)
Three weeks ago, I tested something that completely changed how I think about organic traffic. I opened ChatGPT and asked a simple question: "What's the best course on building SaaS with WordPress?" The answer that appeared stopped me cold.
22 World Cup items, 22 stories
FIFA won't reveal how, but after every game at the 2026 World Cup this summer, it will be collecting items that will one day document the tournament. It already has the net from the 2018 World Cup final, for example, as well as the tracksuit that Pelé wore at his first World Cup in 1958. The items live in FIFA's various museums, ranging from Vancouver and Miami to Zurich and Hong Kong.
The need for a socialist planned economy (2021)
This article is a transcript of the presentation given by Vincent R. Beaudoin at Fightback’s Marxist Winter School 2021. When the Soviet Union collapsed in 1991, Francis Fukuyama told us that this was evidence of the failure of the planned economy and the success of the capitalist market economy, and that it represented the end of history. In October 2018, however, he changed his mind.
How to Save the Supreme Court From Itself
Subscribe here: Apple Podcasts | Spotify | YouTubeIn this episode of The David Frum Show, The Atlantic’s David Frum opens with his thoughts on growing extremism in the Democratic Party. Frum compares this to the paranoia and conspiratorial thinking that cost the Republican Party dearly in the 2010s and cautions the Democrats against making the same mistakes. Then David is joined by Kate Shaw, a co-host of the podcast Strict Scrutiny and a professor of law at University of Pennsylvania Carey...
Ask HN: What are tools you have made for yourself since the advent of AI?
I've made a number of ceramic molds for slumping fused glass into bowls. As well as wooden templates for ceramic mugs. I've devised a few carrying tools to move glass frit paintings from my studio down to my barn where the kilns sit without spilling the glass.