Anthropic has introduced a more extensive – and expensive – way to review source code in hosted repositories, many of which already contain large swaths of AI-generated code.

Code Review is a new service for teams and enterprise customers that drives multiple agents to scour code repos in a concerted effort to catch unidentified bugs.

The company’s Claude models are already capable of conducting code reviews upon demand – you can learn a lot about the quality of AI-generated code by having Claude review its own work. The AI biz also offers a Claude Code GitHub Action that can launch a code review automatically as part of the CI/CD pipeline.

Code Review will do a lot more of that, at greater expense.

“Code Review analyzes your GitHub pull requests and posts findings as inline comments on the lines of code where it found issues,” the company explains in its documentation. “A fleet of specialized agents examine the code changes in the context of your full codebase, looking for logic errors, security vulnerabilities, broken edge cases, and subtle regressions.”

A fleet of specialized agents, you say? That sounds like it might burn a lot of tokens during the inference process. And indeed that’s the case. As Anthropic observes, Code Review focuses on depth, more so than the existing approaches.

“Reviews are billed on token usage and generally average $15–25, scaling with PR [pull request] size and complexity,” the company says.

That’s per pull request. As a point of comparison, Code Rabbit, which offers AI-based code reviews, charges $24 per month.

Code Review is also not very quick. While the amount of time required varies with the size of the pull request, reviews on average take about 20 minutes to complete, according to Anthropic.

Given the time required and the billing rate, the question becomes whether paying a person $60 an hour to conduct a code review would produce comparable or better results.

Still, the AI biz insists its engineers have seen positive results using Code Review, a finding supported in some research but not in all cases.

Anthropic reports that it has used Code Review internally for several months with considerable success. The company claims that for large pull requests consisting of more than 1,000 changed lines, 84 percent of automated reviews find something of note – and 7.5 issues on average. For small pull requests of less than 50 lines, 31 percent get comments, averaging 0.5 issues.

Human developers reject fewer than one percent of issues found by Claude.

Customers that have been testing Code Review have seen some benefits. When TrueNAS embarked on a ZFS encryption refactoring for its open-source middleware, the AI review service spotted a bug in adjacent code that risks having a type mismatch erase the encryption key cache during sync operations.

Anthropic claims that in one instance involving internal code, Code Review caught an innocuous-looking one-line change to a production service that would have broken the service’s authentication mechanism.

“It was fixed before merge, and the engineer shared afterwards that they wouldn’t have caught it on their own,” the AI biz said.

In organizations large enough to afford AI tools, it’s doubtful that software developers will ever work alone again. ®