Home › Knowledge Base › Modeling and Defense

Modeling and Defense

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Random Erasing vs. Model Inversion: A Promising Defense or a False Hope?

arXiv:2409.01062v4 Announce Type: replace Abstract: Model Inversion (MI) attacks pose a significant privacy threat by reconstructing private training data from machine learning models. While existing defenses primarily concentrate on model-centric approaches, the impact of data on MI robustness remains largely unexplored. In this work, we explore Random Erasing (RE), a technique traditionally used for improving model generalization under occlusion, and uncover its surprising effectiveness as...

arXiv CS 8d ago

Collective Hallucination in Multi-Agent LLMs:Modeling and Defense

arXiv:2606.07941v1 Announce Type: new Abstract: Hallucinations in large language models (LLMs) create heightened risks in multi-agent settings, where recursive agent interactions can propagate, reinforce, and amplify unsupported claims. This paper models hallucination as a system-level, time-evolving process across a network of interacting LLM agents, where nodes represent agents and edges encode information exchange. The proposed formulation captures how hallucinated claims diffuse through...

arXiv CS 1d ago

EvoDefense: Co-Evolving Black-Box Defense with Large Language Models

arXiv:2605.31140v1 Announce Type: new Abstract: Large Language Models (LLMs) remain highly vulnerable to diverse attacks, particularly in black-box settings where the internals of target models are inaccessible. Existing black-box defenses typically rely on pre-defined filtering heuristics, which often fail to generalize to unseen attack types and target model architectures. We introduce EvoDefense, an experience-guided co-evolving black-box defense paradigm.

arXiv CS 9d ago

Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models

Announce Type: replace Abstract: Vision-language models (VLMs) such as CLIP show strong zero-shot generalization but remain highly vulnerable to adversarial attacks. Adversarial training improves robustness but is computationally expensive, motivating test-time defenses. Recent approaches exploit how CLIP's visual representations respond to stochastic perturbations: aggregating predictions across noisy views, constructing Gaussian noise-averaged anchors and interpolating features toward...

arXiv CS 5d ago

Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models

Announce Type: new Abstract: Vision-Language Models (VLMs), such as CLIP, have shown strong zero-shot generalization but remain highly vulnerable to adversarial perturbations, posing serious risks in real-world applications. Test-time defenses for VLMs have recently emerged as a promising and efficient approach to defend against adversarial attacks without requiring costly large-scale retraining. In this work, we uncover a surprising phenomenon: under diverse input transformations,...

arXiv CS 5d ago

Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models

arXiv:2606.03730v1 Announce Type: new Abstract: Vision-language models (VLMs) such as CLIP show strong zero-shot generalization but remain highly vulnerable to adversarial attacks. Adversarial training improves robustness but is computationally expensive, motivating test-time defenses.

arXiv CS 7d ago

In-Training Defenses against Emergent Misalignment in Language Models

Announce Type: replace Abstract: Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EM): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain. Even in the case where model weights are hidden behind a fine-tuning API, this gives attackers inadvertent access to a broadly misaligned model in a way that can be hard to detect from the fine-tuning data alone. We...

arXiv CS 5d ago

Courtney Clenney's defense inspects knives at center of OnlyFans model's murder case ahead of trial: report

If you or someone you know is the victim of domestic abuse, please call the National Domestic Violence Hotline at 800-799-7233.Courtney Clenney’s defense team has inspected the knives at the center of her murder case as the former OnlyFans model awaits trial in the fatal stabbing of her boyfriend, Christian Obumseli. During a pretrial hearing on Monday, Clenney’s attorneys confirmed they had been able to examine the knives involved in the case, according to Court TV. Her attorneys also told...

Fox News 7d ago

THRD: A Training-Free Multi-Turn Defense Framework for Jailbreak Attacks on Large Language Models

new Abstract: Multi-turn jailbreak attacks pose a growing threat to LLMs by exploiting conversational dynamics such as gradual escalation and cross-turn coordination. Existing defenses either rely on costly retraining -- often degrading model utility -- or apply single-turn analysis independently at each turn, failing to capture how risk accumulates along interaction trajectories. We observe that safety behavior in multi-turn interaction is trajectory-dependent: dialogue history continuously...

arXiv CS 8d ago

AI Model Extraction Attacks: Bypassing Single-Client Assumptions in Defenses

Announce Type: new Abstract: Ensuring the protection of Artificial Intelligence (AI) models deployed in military Command and Control (C2) systems and critical infrastructure is essential for maintaining information superiority. Model Extraction Attacks (MEAs) pose a significant threat, as they enable adversaries to replicate proprietary models, compromise protected information, and prepare offline adversarial attacks. However, current defense strategies predominantly rely on the Single...

arXiv CS 7d ago