Home Science Red teamers turned Claude Desktop into a double agent to...
Science

Red teamers turned Claude Desktop into a double agent to do their evil bidding

Red teamers turned Claude Desktop into a double agent to do their evil bidding
Key Points

EXCLUSIVE Pentera Labs’ red teamers compromised a developer’s AI agent via his Claude Desktop app and ultimately turned that access into full remote code execution on the dev’s machine – demonstrating how an attacker could turn a trusted, chatty AI assistant into a double agent operating on their behalf. “Claude’s got a new voice,” Pentera's offensive security services team leader Dvir Avraham told The Register. “We acknowledge the huge trust in AI models – everybody uses them,” he said in a...

EXCLUSIVE Pentera Labs’ red teamers compromised a developer’s AI agent via his Claude Desktop app and ultimately turned that access into full remote code execution on the dev’s machine – demonstrating how an attacker could turn a trusted, chatty AI assistant into a double agent operating on their behalf. “Claude’s got a new voice,” Pentera's offensive security services team leader Dvir Avraham told The Register. “We acknowledge the huge trust in AI models – everybody uses them,” he said in a phone interview. “We used this trust to manipulate the victim, like under the hood, the victim didn't see it coming.” It also prompted Avraham to check his own platforms. “I became a little bit paranoid,” he told us. “I'm not allowing any command to run without me examining it twice.” In a report set to publish Wednesday, and shared in advance exclusively with The Register, Avraham and research technical lead Reef Spektor detailed the attack and what it means for organizations using agentic AI tools with local code-execution access. It began with a red-team assignment on a third-party platform that aggregates customer email inboxes into a single management interface. Avraham and Spektor won’t name the platform, or tell us exactly how they gained access to it. They used this compromised inbox – and told us any compromised inbox would work – to get into the victim’s Claude account. As the duo noted, breaking into an email inbox in real life – via a third-party management platform, phishing link, social engineering password reset, or even using AI agents – isn’t too difficult. “AI agents today have access to connectors and to direct MCPs into inboxes,” Spektor added. In addition to this prerequisite (compromised inbox), the attack chain also requires the victim to have Claude Desktop installed. Anthropic’s desktop app works across macOS, Windows, and Linux systems. It provides the same AI chat for conversations as claude.ai, and it also syncs across all devices and sessions tied to the user’s account. “We asked ourselves, can we leverage the sync behavior to infect other sessions and devices? (hint: yes!),” the red teamers wrote in the Wednesday report. Back to the AI Stone Age As of January, the desktop app also includes Cowork for longer agentic tasks, and Code for software development. So, for example, a user can send Claude a task from their phone and instruct it to work on their computer. As Anthropic says: “Anything you can do on your computer, Claude can do. Open apps, fill spreadsheets, navigate your browser. No setup, no passwords handed off.” The Cowork feature now makes Pentera Labs’ attack scenario even easier. However, when the security analysts were doing this research in November 2025, “back in the Stone Age in terms of AI, you didn't have Cowork or Claude Code, so we needed a way to actually execute commands because we wanted to take over the machine,” Avraham said. For this part, they took a keen interest in Claude Desktop’s personalization features. These are account-wide settings that tell the AI agent the user’s preferred approach and general communication instructions, along with more specific project instructions, such as guidelines for a particular workflow, or defined roles Claude should adopt within a project. The red teamers developed a base64-encoded prompt that instructed Claude to check for command-capable tools on the developer’s machine and execute the command if available, or produce a fake error message if not, prompting the user to download a tool that will execute the attacker’s commands. Then they pasted the prompt into the victim’s personal preferences on Claude, and this prompt syncs across all of the user’s devices. This ensures that the next time the user opens Claude Desktop and types in a chat, the poisoned instructions are loaded into their preferences and will silently run behind the scenes. The user thinks they are simply interacting with Claude as usual. They don’t see Claude checking to see what extensions and tools are installed. If the user already has Desktop Commander or a similar MCP connector or extension installed, the poisoned instructions tell Claude to use it. This allows the attacker, via Claude, to execute a stealthy reverse shell or other malicious code. “And from there it's full compromise of the machine,” Avraham said. Phishing - but without the email However, if there aren’t any command-capable tools installed, then Claude becomes what the researchers describe as a “phishing layer.” (They also noted that if they had performed this research more recently, not back in November, the Claude Cowork feature would have eliminated this entire tool enumeration and phishing phase because Cowork can execute commands on a user’s behalf.) The injected prompt instructs Claude to present a realistic-looking error as soon as the victim asks the chatbot a question. This includes a realistic error code, a link that purports to be a fix, and step-by-step instructions. “This message tells the victim: ‘please download this,’ and we took links from the actual Anthropic site, with known emojis that the AI loves,” Avraham said. Because the error message looks real and people usually trust their AI assistant, they will likely click on the link and execute the attacker-controlled command. “From here, the attacker has full command execution – reverse shells, data exfiltration, credential harvesting, whatever the objective calls for,” the duo wrote. “In our case, we had Claude curl a remote server we controlled on every interaction, fetching and executing whatever bash commands we served back. We could rotate those commands server side at will, effectively turning Claude into a persistent, stealthy C2 agent that the victim themselves kept feeding.” In this specific case, the target was a developer who had credentials and access to several internal systems. After compromising the dev’s workstation – which gave the red teamers a foothold into the organization – they moved laterally across the company using various attack vectors that they declined to tell us about, citing customer privacy and proprietary methods. But, Spektor added, developers make for an “excellent starting point for an attacker,” because of their access to secrets including API keys, tokens, and cloud credentials, which allows intruders to move from a single workstation into the larger organization’s cloud environment. From there, they’ve got free rein to steal source code and other sensitive data, or poison internal git repositories, and cause all sorts of pain for enterprises as we've seen play out multiple times across several recent attacks. Feature, not a bug The team reported their findings to Anthropic back in November, and the AI company essentially said it’s Claude Desktop working as intended – a feature, not a bug. “After reviewing your submission, we've determined this doesn't represent a security vulnerability that falls within our program scope,” Anthropic said. “Our current threat model treats personal preferences, skills, and MCP connectors as features that can execute code through Claude Desktop by design. While we recognize these features can be leveraged to execute arbitrary code when manipulated, this represents expected functionality rather than a security vulnerability in our infrastructure.” The Register reached out to Anthropic for comment and did not receive any response. The red teamers, however, have some suggestions to keep your organization safer from rogue AI agents. First, for anyone using agents or chatbots: pay close attention to what the AI can do on your machine, and don’t blindly follow install prompts or error messages. “If you can, run it on a sandbox and not on your personal computer,” Spektor said. Security teams should treat AI desktop apps as “privileged software” as they can execute code, read files, and interact with local tools. “Monitor for changes of AI assistant configurations and synced settings,” the researchers wrote. “Restrict which extensions and tools can be installed alongside AI apps.” And finally, red teams should add AI desktop apps to their assessment toolbox, Avraham and Spektor noted: “There's a real attack surface here that most engagements don’t cover yet.” ®
Claude Desktop (PERSON) AI (ORG) Claude (PERSON) Dvir Avraham (PERSON) Register (ORG) Avraham (PERSON) Spektor (PERSON) MCPs (ORG) Windows (ORG) Linux (LOCATION) the AI Stone Age (ORG) Pentera Labs (ORG) Claude Desktop’s (PERSON)
Originally published by The Register Read original →