Home Knowledge Base the Greedy Coordinate Gradient

the Greedy Coordinate Gradient

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Cross-Entropy Games and Frost Training

arXiv:2605.27701v2 Announce Type: replace Abstract: We present Frost Training, a method for improving Monte Carlo-based policy optimization for a large family of LLM-as-a-judge tasks called Cross-Entropy Games. The key idea is to exploit the gradient of the reward function in embedding space. This signal is used in the Greedy Coordinate Gradient (GCG) jailbreaking technique; we demonstrate for the first time that it can also be used to boost model training.

arXiv CS 8d ago

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

arXiv:2606.05609v1 Announce Type: new Abstract: As large language models (LLMs) are widely deployed, identifying their vulnerability through jailbreak attacks becomes increasingly critical. Optimization-based attacks like Greedy Coordinate Gradient (GCG) have focused on inserting adversarial tokens to the end of prompts. However, GCG restricts adversarial tokens to a fixed insertion point (typically the prompt suffix), leaving the effect of inserting tokens at other positions unexplored.

arXiv CS 5d ago