A serious security flaw has been discovered in Anthropic's new AI tool, Cowork, and it's raising concerns about the company's approach to prompt injection risks.
Anthropic, a well-known AI company, has once again found itself in the spotlight for its handling of a critical vulnerability. This time, it's a Files API exfiltration attack chain that was first reported last October and acknowledged but left unfixed by Anthropic. The issue has now resurfaced with the launch of their new productivity AI, Cowork.
But here's where it gets controversial...
A security firm, PromptArmor, specializing in AI vulnerabilities, has revealed that Cowork can be manipulated through prompt injection. This allows an attacker to transmit sensitive files to their Anthropic account without any additional user approval. The process is straightforward, and the risk is amplified by Cowork's target audience - non-developer users who might not fully comprehend the potential consequences.
Cowork, designed to automate office tasks, scans files like spreadsheets and everyday documents. By connecting Cowork to a local folder with sensitive information and uploading a document containing a hidden prompt injection, the attacker can trigger the exfiltration. PromptArmor demonstrated this by using a curl command to Anthropic's file upload API, making a file available to the attacker through their account. In their proof of concept, they used a real estate file, which the attacker could then query via Claude to access financial details and personal information.
And this is the part most people miss...
This flaw is similar to the one reported by security researcher Johann Rehberger regarding Claude Code last year. Rehberger's report was initially dismissed by Anthropic, who later admitted the possibility of prompt injection attacks. The company's response seems to follow a pattern, shifting the responsibility onto users and advising them to be cautious about what they connect to the bot.
When asked about potential solutions, such as implementing an API check, Anthropic remained silent. Their response to the Cowork issue is similar, stating that prompt injection attacks are a known industry challenge and that agent safety is an ongoing development area. Anthropic warns Cowork users to avoid connecting to sensitive documents and to monitor for suspicious actions, but developer Simon Willison argues that this is an unrealistic expectation for non-technical users.
So, what's the bigger picture?
This isn't Anthropic's first encounter with reported flaws left unpatched. In June 2025, Trend Micro disclosed a SQL injection flaw in Anthropic's open-source SQLite MCP server, which the company claimed was out of scope due to the code being archived. However, the vulnerable code had already been copied over 5,000 times, potentially affecting numerous projects.
Anthropic's stance on these issues seems to be that human oversight is the solution, but PromptArmor's report highlights the company's framing of the risk as a user management problem. When questioned about their approach, Anthropic emphasized that prompt injection is an industry-wide concern and that they are working on minimizing injections in their products. They plan to release an update to the Cowork VM to improve its interaction with the vulnerable API and have promised further security enhancements.
Despite these assurances, the question remains: Is Anthropic doing enough to address these critical security flaws, or are they relying too heavily on user vigilance?