Home › Knowledge Base › the Threat Model

the Threat Model

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

arXiv:2509.20324v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) is an emerging approach in natural language processing that combines large language models (LLMs) with external document retrieval to produce more accurate and grounded responses. While RAG has shown strong potential in reducing hallucinations and improving factual consistency, it also introduces new privacy and security challenges that differ from those faced by traditional LLMs. Existing research has...

arXiv CS 5d ago

Validity Threats for Foundation Model Research

arXiv:2606.05029v1 Announce Type: new Abstract: Controlled experiments are the backbone of machine learning research, but at the scale of modern foundation models, they have become prohibitively expensive. Instead, the community increasingly relies on research strategies that approximate the ideal experiment at a fraction of the cost: proxy experiments and scaling laws, observational studies with publicly available models, and single-run designs that leverage variation within individual...

arXiv CS 6d ago

Is Cornwall’s newest high street an inspiring model, or a threat to nearby Newquay?

The Guardian UK 18d ago

Position: the Stochastic Parrot in the Coal Mine. Model Collapse is a Threat to Low-Resource Communities

arXiv:2605.04127v2 Announce Type: replace Abstract: Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factors contribute to data...

arXiv CS 8d ago

An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic

Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed through hosted APIs, making model extraction a practical threat to model ownership and service security. However, individual extraction queries often resemble benign requests, and existing evaluations often focus on single-query anomaly scoring or pure benign-versus-attacker user settings. We formulate model extraction monitoring as benign-calibrated traffic-window distribution testing and show that an...

arXiv CS 5d ago

AI Model Extraction Attacks: Bypassing Single-Client Assumptions in Defenses

Announce Type: new Abstract: Ensuring the protection of Artificial Intelligence (AI) models deployed in military Command and Control (C2) systems and critical infrastructure is essential for maintaining information superiority. Model Extraction Attacks (MEAs) pose a significant threat, as they enable adversaries to replicate proprietary models, compromise protected information, and prepare offline adversarial attacks. However, current defense strategies predominantly rely on the Single...

arXiv CS 7d ago

Anthropic's open-source framework for AI-powered vulnerability discovery

A reference implementation for autonomous vulnerability discovery and remediation with Claude, based on our learnings from partnering with security teams at several organizations since launching Claude Mythos Preview. For a write up of these learnings along with best practices, see the accompanying blog post (also available in blog-post.md ). For a lightweight SDK-only walkthrough of the same recon → find → triage → report → patch loop, see the companion cookbook.

Hacker News 5d ago

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

arXiv:2605.15118v2 Announce Type: replace Abstract: We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy -- 401 data-populated and 106 threat-model-derived leaves -- of inference-time attacks extracted from 932 arXiv security studies (2023--2026). The matrix enables benchmark-external validation -- auditing collective coverage...

arXiv CS 6d ago

Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?

arXiv:2409.01062v4 Announce Type: replace Abstract: Model Inversion (MI) attacks pose a significant privacy threat by reconstructing private training data from machine learning models. While existing defenses primarily concentrate on model-centric approaches, the impact of data on MI robustness remains largely unexplored. In this work, we explore Random Erasing (RE), a technique traditionally used for improving model generalization under occlusion, and uncover its surprising effectiveness as...

arXiv CS 8d ago

D-Judge: Disrupting Multi-Turn Jailbreaks using Semantics-Preserving Output Rewriting

Announce Type: new Abstract: Multi-turn jailbreak attacks pose a growing threat to large language model (LLM) safety because they exploit feedback from auxiliary judge models to iteratively refine prompts toward harmful goals. Existing defenses largely detect or block unsafe content at individual turns or at the final response, leaving the judge-driven refinement loop intact and allowing attackers to extract informative feedback from intermediate interactions.

arXiv CS 7d ago