Home Knowledge Base Black-Box Adversarial Attacks

Black-Box Adversarial Attacks

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning

arXiv:2503.01734v3 Announce Type: replace Abstract: Attacks on machine learning models have been extensively studied through stateless optimization. In this paper, we demonstrate how a reinforcement learning (RL) agent can learn a new class of attack algorithms that generate adversarial samples. Unlike traditional adversarial machine learning (AML) methods that craft adversarial samples independently, our RL-based approach retains and exploits past attack experience to improve the...

arXiv CS 5d ago

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

arXiv:2606.03647v1 Announce Type: new Abstract: Accurately evaluating adversarial robustness is a longstanding challenge. A flawed attack design can inflate robustness estimates, making deployment risk assessment and defense comparison unreliable.

arXiv CS 7d ago

Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection

arXiv:2605.30366v1 Announce Type: new Abstract: Recent Singing Voice Synthesis (SVS) advances enable highly realistic but potentially malicious AI covers, making singing voice deepfake detection (SVDD) crucial. Self-Supervised Learning (SSL)-based detectors achieve state-of-the-art performance by fine-tuning speech SSL backbones to capture singing-specific spoof artifacts.

arXiv CS 9d ago

Audio Pirates: Black-box Audio Watermark Removal via Diffusion Priors

Announce Type: new Abstract: With the rise of AI-generated audio, watermarking has become widely used for detecting misuse and protecting intellectual property. However, adversaries may try to remove these watermarks, making it critical to evaluate how well watermarking schemes withstand removal attacks. Existing attacks are often impractical: they either noticeably degrade perceptual quality or require access to the watermarking scheme.

arXiv CS 9d ago

What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents

arXiv:2606.02668v1 Announce Type: new Abstract: Coding agents gate consequential actions behind a human-in-the-loop approval dialog, but the dialog is narrated by the agent itself: the human approves a summary the agent writes. The Lies-in-the-Loop (LITL) attack shows that summary is forgeable, so a compromised agent can show a benign description while a different action runs.

arXiv CS 7d ago

Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks

arXiv:2605.31219v1 Announce Type: new Abstract: While decision-based black-box adversarial attacks present a severe security threat, current methodologies suffer from fundamental limitations. Pixel-wise attacks frequently introduce unnatural, high-frequency visual artifacts, while latent-space frameworks are confined by the limited search space of low-dimensional manifolds and inherent reconstruction flaws. To resolve these limitations, we propose Latent Geometric Chords (LGC) for...

arXiv CS 9d ago

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

arXiv:2606.05678v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems have become widely used for multilingual speech-to-text transcription. Their robustness to adversarial attacks has become an important topic for the community. Existing adversarial attacks directly add adversarial noise to the speech audio.

arXiv CS 5d ago

Fair Finetuning Mitigates Distribution Inference Attacks

arXiv:2606.01719v1 Announce Type: new Abstract: Machine learning models trained on sensitive data can inadvertently leak population-level information about their training distributions -- a threat known as distribution inference attack (DIA). An adversary with black-box access can infer sensitive demographic properties, such as subgroup proportions, without observing any training data directly. While defenses such as differential privacy and property unlearning have been proposed, the link...

arXiv CS 8d ago

What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks

arXiv:2606.09700v1 Announce Type: new Abstract: Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized text and largely ignore the visual cues that humans naturally rely on when interpreting content.

arXiv CS 1d ago

CHASE: Adversarial Red-Blue Teaming for Improving LLM Safety using Reinforcement Learning

arXiv:2606.05523v1 Announce Type: new Abstract: Despite advances in safety alignment, prompt-rewriting attacks such as persona modulation, fictional framing and persuasion-based reformulation, can bypass safety filters even on frontier models. Existing defenses either rely on non-scalable human curation or white-box optimisation that overfits to specific model internals, leaving aligned models brittle against the very class of adaptive black-box adversaries they will face in deployment. To...

arXiv CS 5d ago