Home › Knowledge Base › Multiple-Choice

Multiple-Choice

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions

arXiv:2509.23782v4 Announce Type: replace Abstract: While large language models (LLMs) perform strongly on diverse tasks, their trustworthiness is limited by erratic behavior that is unfaithful to their internal knowledge. In particular, LLMs often fail on multiple-choice questions (MCQs) even if they encode correct answers in their hidden representations, revealing a misalignment between internal knowledge and output behavior. We investigate and mitigate this knowledge-prediction gap on...

arXiv CS 8d ago

Structure-Aware Modeling of Multiple-Choice Questions Improves Automatic Difficulty Estimation

arXiv:2606.08988v1 Announce Type: new Abstract: Automatic Question Difficulty Estimation (AQDE) holds growing promise for educational assessment because it has the potential to yield difficulty estimates that are competitive with expert judgment, while helping reduce the time and financial burden associated with pilot administrations and scaling to digital testing contexts. Prior AQDE studies report mixed evidence on whether adding distractors as additional text to the question stem and the...

arXiv CS 1d ago

Discovering Misconceptions and Misunderstandings From Administrations of Research-Designed Multiple Choice Instruments

arXiv:2606.08986v1 Announce Type: new Abstract: Misconceptions are "alternate hypotheses" that are incorrect according to established theories of how the world works. Often held with confidence by students, they are relatively context-insensitive, can seem like common-sense views, and are noted for being resistant to remediation using traditional instruction. To find misconceptions in Newtonian mechanics, we analyze ~34,000 administrations of the pioneering Force Concept Inventory using a...

arXiv Physics 1d ago

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

arXiv:2604.04944v2 Announce Type: replace Abstract: Multiple-choice questions (MCQs) are widely used to evaluate large language models (LLMs). However, LLMs remain vulnerable to the presence of plausible distractors. This often diverts attention toward irrelevant choices, resulting in unstable oscillation between correct and incorrect answers.

arXiv CS 6d ago

EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

arXiv:2603.27223v2 Announce Type: replace Abstract: We present EuraGovExam, a multilingual and multimodal benchmark sourced from real-world civil service examinations across five representative Eurasian regions: South Korea, Japan, Taiwan, India, and the European Union. Designed to reflect the authentic complexity of public-sector assessments, the dataset contains over 8,000 high-resolution scanned multiple-choice questions covering 17 diverse academic and administrative domains. Unlike...

arXiv CS 8d ago

DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving

Announce Type: new Abstract: Multi-party dialogue is a critical setting for studying collaborative reasoning and decision-making, yet existing datasets rarely focus on structured, in-depth complex reasoning tasks. We introduce DeliChess, a novel dataset of group deliberation dialogues in which participants collaboratively solve multiple-choice chess puzzles. Each group first completes the puzzle individually, then engages in a multi-party discussion before submitting a revised collective answer.

arXiv CS 6d ago

Question Type, Cognitive Load, and CEFR Alignment: Evaluating LLM-Generated EFL Grammar Drill Exercises

Announce Type: new Abstract: This study evaluates the pedagogical viability of LLM-generated English as a Foreign Language (EFL) learning content. Utilising log data from Japanese junior high school students practicing on a grammar drilling application, we analysed how different question modalities impact student performance and whether theoretical localised CEFR difficulty tiers accurately predict empirical task difficulty. Results reveal a clear performance hierarchy: multiple-choice...

arXiv CS 8d ago

EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge

Announce Type: replace Abstract: This technical report presents our solution, EgoAdapt (Egocentric Adaptation via Category, Calibration, and Consistency), to the CVPR 2026 HD-EPIC VQA challenge. HD-EPIC evaluates whether a vision-language model can reason over realistic first-person kitchen videos, where the evidence for an answer may be a short hand-object interaction, a long recipe trajectory, a spatial relation to a fixture, or a subtle gaze cue. The benchmark contains 26K multiple-choice...

arXiv CS 5d ago

Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook

arXiv:2604.06210v3 Announce Type: replace Abstract: As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional...

arXiv CS 8d ago

Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning

Announce Type: new Abstract: Multimodal large language models (MLLMs) remain unreliable on spatial multiple-choice questions, and their failures are often attributed to poorly attended visual information. In this work, we identify a complementary failure mode, spatial lexical bias: adding a spatial relation word to the answer options can attract the model's decision and make the newly added option likely to be selected.

arXiv CS 8d ago