Home Knowledge Base Dual Mechanisms of Value Expression: Intrinsic

Dual Mechanisms of Value Expression: Intrinsic

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models

Announce Type: replace Abstract: Large language models can express values in two main ways: (1) intrinsic expression, reflecting the model's inherent values learned during training, and (2) prompted expression, elicited by explicit prompts. Given their widespread use in value alignment, it is paramount to clearly understand their underlying mechanisms, particularly whether they mostly overlap (as one might expect) or rely on distinct mechanisms. We analyze this largely understudied problem...

arXiv CS 9d ago