Patcher
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Patcher: Post-Hoc Patching of Backdoored Large Language Models
Announce Type: new Abstract: Large language models remain vulnerable to jailbreak backdoor attacks, where adversaries poison safety alignment data to embed hidden triggers that bypass safety mechanisms. Existing defenses often require comprehensive attack information or multiple triggered examples, making them impractical when defenders only observe a single reported failure case without knowing whether it stems from a backdoor attack or a natural alignment bug. This paper presents Patcher,...
Defending Against Malicious Finetuning by Scaling Train-time Adversarial Attacks
Announce Type: new Abstract: Current open-weight large language models (LLMs) are prone to malicious finetuning attacks, which could compromise the safety alignment of LLMs with only a few steps of supervised finetuning (SFT) on poisoned datasets. Existing alignment-stage defenses are primarily designed to defend against attacks that use parameter-efficient finetuning methods. However, they fail to defend against stronger attacks that use full-parameter finetuning.
macOS 27 requires Apple Silicon, as Apple draws down the Intel Mac era
As Apple announced last year, this year's macOS release will end support for Intel Macs. The macOS 27 Golden Gate release will require a Mac with an Apple Silicon chip inside, including the original M1 that launched in the MacBook Air, MacBook Pro, and Mac mini back in late 2020. Intel Macs running macOS 26 Tahoe can expect security and Safari patches for about two more years after the release of macOS 27 Golden Gate.