Home Knowledge Base MultiArith

MultiArith

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks

arXiv:2606.03606v1 Announce Type: new Abstract: Large language models achieve strong performance on arithmetic reasoning benchmarks, and one common response to arithmetic brittleness is to delegate computation to code. Yet models are still often used in settings where they must reason directly from natural language, and trustworthy models should solve small-number arithmetic word problems without external tools. Prior work shows that LLMs are sensitive to numerical variation: a model may...

arXiv CS 7d ago

Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks

arXiv:2606.03606v2 Announce Type: replace Abstract: Large language models achieve strong performance on arithmetic reasoning benchmarks, and one common response to arithmetic brittleness is to delegate computation to code. Yet models are still often used in settings where they must reason directly from natural language, and trustworthy models should solve small-number arithmetic word problems without external tools. Prior work shows that LLMs are sensitive to numerical variation: a model may...

arXiv CS 6d ago