Home Knowledge Base PsychoSafe

PsychoSafe

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

new Abstract: Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention. However, refusals themselves can be helpful. In high-risk interactions involving crisis, coercion, or escalating intent, blunt non-compliance may prevent direct harm while still failing to support the needs of the person behind the request.

arXiv CS 1d ago