Home Knowledge Base Dynamic Counterfactual Sensitivity

Dynamic Counterfactual Sensitivity

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

arXiv:2606.09043v1 Announce Type: new Abstract: Reward models trained from pairwise preferences often exploit superficial shortcut cues rather than learning true response quality. We propose DynaCF, a dynamic reweighting framework for mitigating shortcut learning in reward model training. Unlike static shortcut heuristics, DynaCF measures shortcut sensitivity online during optimization by applying semantics-preserving counterfactual perturbations and tracking the resulting margin shifts and...

arXiv CS 1d ago

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

arXiv:2606.06443v1 Announce Type: new Abstract: Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditing LLM-based stance simulation.

arXiv CS 5d ago

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

arXiv:2606.06443v2 Announce Type: replace Abstract: Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditing LLM-based stance simulation.

arXiv CS 1d ago