Home › Knowledge Base › Dynamic Counterfactual Sensitivity

Dynamic Counterfactual Sensitivity

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

arXiv:2606.09043v1 Announce Type: new Abstract: Reward models trained from pairwise preferences often exploit superficial shortcut cues rather than learning true response quality. We propose DynaCF, a dynamic reweighting framework for mitigating shortcut learning in reward model training. Unlike static shortcut heuristics, DynaCF measures shortcut sensitivity online during optimization by applying semantics-preserving counterfactual perturbations and tracking the resulting margin shifts and...

arXiv CS 1d ago

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

arXiv:2606.06443v1 Announce Type: new Abstract: Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditing LLM-based stance simulation.

arXiv CS 5d ago

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

arXiv:2606.06443v2 Announce Type: replace Abstract: Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditing LLM-based stance simulation.

arXiv CS 1d ago