Home › Knowledge Base › AI Model Extraction Attacks:

AI Model Extraction Attacks:

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

AI Model Extraction Attacks: Bypassing Single-Client Assumptions in Defenses

Announce Type: new Abstract: Ensuring the protection of Artificial Intelligence (AI) models deployed in military Command and Control (C2) systems and critical infrastructure is essential for maintaining information superiority. Model Extraction Attacks (MEAs) pose a significant threat, as they enable adversaries to replicate proprietary models, compromise protected information, and prepare offline adversarial attacks. However, current defense strategies predominantly rely on the Single...

arXiv CS 7d ago

FlowGuard: Flow Matching for Identity-Independent Detection of Data-Free Model Stealing Attacks on Energy System Intrusion Detection Systems

arXiv:2606.03430v1 Announce Type: new Abstract: Artificial Intelligence (AI)-based Intrusion Detection Systems (IDS) deployed in energy infrastructure are vulnerable to model theft attacks, which allow adversaries to create evasive traffic offline. Current defences against model extraction rely either on identity-bound query monitoring, which is ineffective against distributed attackers (Sybil), or on prediction poisoning through soft-label perturbation, which is inapplicable to hard-label...

arXiv CS 7d ago

Can Subgraph Explanations Be Weaponized to Steal Graph Neural Networks?

arXiv:2605.30470v1 Announce Type: new Abstract: Graph Machine Learning as a Service (GMLaaS) platforms increasingly implement explainability interfaces to meet regulatory transparency requirements. However, this transparency creates exploitable vulnerabilities for model extraction attacks.

arXiv CS 9d ago

Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks

Announce Type: new Abstract: Standard AI red teaming evaluations reduce adversarial campaigns to a single binary outcome, attack success rate (ASR), not taking into account the sequential structure of how models resist or yield to attacks. We propose applying process mining, a discipline for discovering and analyzing process models from event logs, to red teaming traces. We conduct a controlled experiment pitting 60 HarmBench prompts against two LLMs, GPT-OSS 120B and Llama 3.3 70B, using 10...

arXiv CS 1d ago

EnclaveScale: Hardware-Assisted Edge-DP for Secure Data Centre Power Telemetry

arXiv:2606.09163v1 Announce Type: new Abstract: EnclaveScale is a distributed, hardware-assisted telemetry architecture providing post-extraction attestation, enabling operators to collaboratively model high-resolution generative AI power transients. Existing cryptographic techniques scale poorly for 10-Hz streaming or fail to authenticate origins, permitting malicious hosts to spoof sensor inputs. We implement and evaluate a post-extraction pipeline utilizing DCAP attestation, differential...

arXiv CS 1d ago

SV-Detect: AI-generated Text Detection with Steering Vectors

new Abstract: Detecting machine-generated text is especially difficult under distribution shift, such as transfer across domains, source models, and editing attacks. We propose a fake-text detector based on steering vectors extracted from the hidden representations of a frozen language model. At each layer, we construct a direction that separates human-written from machine-generated text, and represent each input by its layer-wise alignment with these directions.

arXiv CS 2d ago

Agentic Relationship Harm: Benchmarking and Gating Relational Manipulation in AI Agents

Announce Type: new Abstract: AI agents built on large language models can assist not only legitimate tasks but also relational manipulation. AI agents can be used to help a user maintain a deceptive identity, intensify emotional dependency, isolate a target, or prepare for later extraction. We conceptualise this risk as agentic relationship harm: workflow-level assistance that can exploit recipient vulnerability, persuasive influence, and relational power asymmetry.

arXiv CS 7d ago

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

arXiv:2604.01039v2 Announce Type: replace Abstract: System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive operational context in agentic AI applications. These instructions may contain sensitive information such as API credentials, internal policies, and privileged workflow definitions, making system instruction leakage a critical security risk highlighted in the OWASP Top 10 for LLM Applications....

arXiv CS 1d ago

Claude Fable 5

Claude Fable 5 and Claude Mythos 5 Today we’re launching Claude Fable 5: a Mythos-class1 model that we’ve made safe for general use. Fable 5’s capabilities exceed those of any model we’ve ever made generally available.

Hacker News 1d ago

Human-Like Neural Nets by Catapulting

Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...

Hacker News 3d ago