Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while using the feature” amounts to “unfairly outsourcing the problem to Anthropic’s users.”

Anthropic’s mitigations

Anthropic is not completely ignoring the problem, however. The company has implemented several security measures for the file creation feature. For Pro and Max users, Anthropic disabled public sharing of conversations that use the file creation feature. For Enterprise users, the company implemented sandbox isolation so that environments are never shared between users. The company also limited task duration and container runtime “to avoid loops of malicious activity.”

For Team and Enterprise administrators, Anthropic also provides an allowlist of domains Claude can access, including api.anthropic.com, github.com, registry.npmjs.org, and pypi.org. The documentation states that “Claude can only be tricked into leaking data it has access to in a conversation via an individual user’s prompt, project or activated connections.”

Anthropic’s documentation states the company has “a continuous process for ongoing security testing and red-teaming of this feature.” The company encourages organizations to “evaluate these protections against their specific security requirements when deciding whether to enable this feature.”

Prompt injections galore

Even with Anthropic’s security measures, Willison says he’ll be cautious. “I plan to be cautious using this feature with any data that I very much don’t want to be leaked to a third party, if there’s even the slightest chance that a malicious instruction might sneak its way in,” he wrote on his blog.

We covered a similar potential prompt injection vulnerability with Anthropic’s Claude for Chrome, which launched as a research preview last month. For enterprise customers considering Claude for sensitive business documents, Anthropic’s decision to ship with documented vulnerabilities suggests competitive pressure may be overriding security considerations in the AI arms race.

That kind of “ship first, secure it later” philosophy has caused frustrations among some AI experts like Willison, who has extensively documented prompt injection vulnerabilities (and coined the term). He recently described the current state of AI security as “horrifying” on his blog, noting that these prompt injection vulnerabilities remain widespread “almost three years after we first started talking about them.”

In a prescient warning from September 2022, Willison wrote that “there may be systems that should not be built at all until we have a robust solution.” His recent assessment in the present? “It looks like we built them anyway!”