Home › Knowledge Base › Adversarial ML

Adversarial ML

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Position: Adversarial ML for LLMs Is Not Making Any Progress

Announce Type: replace Abstract: In the past decade, considerable research effort has been devoted to securing machine learning (ML) models that operate in adversarial settings. Yet, progress has been slow even for simple "toy" problems (e.g., robustness to small adversarial perturbations) and is often hindered by non-rigorous evaluations. Today, adversarial ML research has shifted towards studying larger, general-purpose language models.

arXiv CS 7d ago

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

arXiv:2603.24511v2 Announce Type: replace Abstract: We show that AI agents are capable of discovering novel algorithms for adversarial attacks against LLMs, advancing the state of the art on white-box jailbreaking and prompt injection evaluations. We deploy frontier agents, such as Claude Code and Codex, in an autoresearch loop with access to a library of 30+ prior methods and an evaluation script with a fixed compute budget. We show this pipeline to be effective in jailbreaking OpenAI's...

arXiv CS 8d ago

SECUREVENT: Hybrid AI/ML Security Monitoring for Distributed Event-Based Systems

Announce Type: new Abstract: Distributed event-based systems have become a common substrate for Internet-scale publish/subscribe services, IoT telemetry, cloud-native microservices, and security operations pipelines. Their loose coupling and asynchronous delivery improve scalability, but they also expand the attack surface: publishers, brokers, subscribers, topics, schemas, and temporal ordering can each be abused without a single component observing the whole behavior. This paper proposes...

arXiv CS 8d ago

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

arXiv:2605.10807v4 Announce Type: replace Abstract: The integration of Large Language Models (LLMs) into Electronic Design Automation (EDA) and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level (RTL) code, automating testbenches, and bridging the semantic gap between high-level specifications and silicon, they simultaneously introduce severe vulnerabilities. This comprehensive review provides...

arXiv CS 5d ago

A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models

arXiv:2605.30843v1 Announce Type: new Abstract: In the forward reinforcement-learning problem, the reward is fixed and known; the learner is asked to find a good policy or value function. Here we turn the question around. Given offline data generated by an expert, can we recover the reward the expert was optimizing?

arXiv CS 9d ago

TAO: Tolerance-Aware Optimistic Verification for Floating-Point Neural Networks

arXiv:2510.16028v4 Announce Type: replace Abstract: Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings).

arXiv CS 1d ago

Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning

arXiv:2503.01734v3 Announce Type: replace Abstract: Attacks on machine learning models have been extensively studied through stateless optimization. In this paper, we demonstrate how a reinforcement learning (RL) agent can learn a new class of attack algorithms that generate adversarial samples. Unlike traditional adversarial machine learning (AML) methods that craft adversarial samples independently, our RL-based approach retains and exploits past attack experience to improve the...

arXiv CS 5d ago

Multi-Agent Teams Hold Experts Back

arXiv:2602.01011v4 Announce Type: replace Abstract: Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams...

arXiv CS 9d ago

Diffuse AI Control on Fuzzy Tasks

Announce Type: new Abstract: AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition.

arXiv CS 1d ago

Cryptographic Backdoor for Neural Networks: Boon and Bane

arXiv:2509.20714v2 Announce Type: replace Abstract: In this paper we show that cryptographic backdoors in a neural network (NN) can be highly effective in two directions, namely mounting the attacks as well as in presenting the defenses as well. On the attack side, a carefully planted cryptographic backdoor enables powerful and invisible attack on the NN. Considering the defense, we present applications:

arXiv CS 1d ago