{"id":608945,"date":"2026-04-27T08:48:44","date_gmt":"2026-04-27T08:48:44","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/608945\/"},"modified":"2026-04-27T08:48:44","modified_gmt":"2026-04-27T08:48:44","slug":"an-update-on-recent-claude-code-quality-reports-anthropic","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/608945\/","title":{"rendered":"An update on recent Claude Code quality reports \\ Anthropic"},"content":{"rendered":"<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Over the past month, we\u2019ve been looking into reports that Claude\u2019s responses have worsened for some users. We\u2019ve traced these reports to three separate changes that affected Claude Code, the Claude Agent SDK, and Claude Cowork. The API was not impacted.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">All three issues have now been resolved as of April 20 (v2.1.116).<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">In this post, we explain what we found, what we fixed, and what we\u2019ll do differently to ensure similar issues are much less likely to happen again.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">We take reports about degradation very seriously. We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">After investigation, we identified three different issues:<\/p>\n<p>On March 4, we changed Claude Code&#8217;s default reasoning effort from high to medium to reduce the very long latency\u2014enough to make the UI appear frozen\u2014some users were seeing in high mode. This was the wrong tradeoff. We reverted this change on April 7 after users told us they&#8217;d prefer to default to higher intelligence and opt into lower effort for simple tasks. This impacted Sonnet 4.6 and Opus 4.6.On March 26, we shipped a change to clear Claude&#8217;s older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6.On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality and was reverted on April 20. This impacted Sonnet 4.6, Opus 4.6, and Opus 4.7.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Because each change affected a different slice of traffic on a different schedule, the aggregate effect looked like broad, inconsistent degradation. While we began investigating reports in early March, they were challenging to distinguish from normal variation in user feedback at first, and neither our internal usage nor evals initially reproduced the issues identified.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">This isn\u2019t the experience users should expect from Claude Code. As of April 23, we\u2019re resetting usage limits for all subscribers.<\/p>\n<p>A change to Claude Code&#8217;s default reasoning effort<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">When we released Opus 4.6 in Claude Code in February, we set the default reasoning effort to high.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Soon after, we received user feedback that Claude Opus 4.6 in high effort mode would occasionally think for too long, causing the UI to appear frozen and leading to disproportionate latency and token usage for those users.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">In general, the longer the model thinks, the better the output. Effort levels are how Claude Code lets users set that tradeoff\u2014more thinking versus lower latency and fewer usage limit hits. As we calibrate effort levels for our models, we take this tradeoff into account in order to pick points along the test-time-compute curve that give people the best range of options. In the product layer, we then choose which point along this curve we set as our default, and that is the value we send to the Messages API as the effort parameter; we then make the other options available via \/effort.<\/p>\n<p><img loading=\"lazy\" width=\"3840\" height=\"2160\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2026\/04\/1777279723_716_image.webp\"\/><\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">In our internal evals and testing, medium effort achieved slightly lower intelligence with significantly less latency for the majority of tasks. It also didn\u2019t suffer from the same issues with occasional very long tail latencies for thinking, and it helped maximize users\u2019 usage limits. As a result, we rolled out a change making medium the default effort, and explained the rationale via in-product dialog.<\/p>\n<p><img loading=\"lazy\" width=\"3794\" height=\"2260\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2026\/04\/1777279723_13_image.webp\"\/><\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Soon after rolling out, users began reporting that Claude Code felt less intelligent. We shipped a number of design iterations to make the current effort setting clearer in order to alert people they could change the default (notices on startup, an inline effort selector, and bringing back ultrathink), but most users retained the medium effort default.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">After hearing feedback from more customers, we reversed this decision on April 7. All users now default to xhigh effort for Opus 4.7, and high effort for all other models.<\/p>\n<p>A caching optimization that dropped prior reasoning<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">When Claude reasons through a task, that reasoning is normally kept in the conversation history so that on every subsequent turn, Claude can see why it made the edits and tool calls it did.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">On March 26, we shipped what was meant to be an efficiency improvement to this feature. We use prompt caching to make back-to-back API calls cheaper and faster for users. Claude writes the input tokens to the cache when it makes an API request, then after a period of inactivity the prompt is evicted from cache, making room for other prompts. Cache utilization is something we manage carefully (more on our <a href=\"https:\/\/x.com\/trq212\/status\/2024574133011673516\" rel=\"nofollow\">approach<\/a>).<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">The design should have been simple: if a session has been idle for more than an hour, we could reduce users\u2019 cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We\u2019d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">The implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session. After a session crossed the idle threshold once, each request for the rest of that process told the API to keep only the most recent block of reasoning and discard everything before it. This compounded: if you sent a follow-up message while Claude was in the middle of a tool use, that started a new turn under the broken flag, so even the reasoning from the current turn was dropped. Claude would continue executing, but increasingly without memory of why it had chosen to do what it was doing. This surfaced as the forgetfulness, repetition, and odd tool choices people reported.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Because this would continuously drop thinking blocks from subsequent requests, those requests also resulted in cache misses. We believe this is what drove the separate reports of usage limits draining faster than expected.<\/p>\n<p><img loading=\"lazy\" width=\"3016\" height=\"1198\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2026\/04\/1777279724_737_image.webp\"\/><\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Two unrelated experiments made it challenging for us to reproduce the issue at first: an internal-only server-side experiment related to message queuing; and an orthogonal change in how we display thinking suppressed this bug in most CLI sessions, so we didn\u2019t catch it even when testing external builds.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">This bug was at the intersection of Claude Code\u2019s context management, the Anthropic API, and extended thinking. The changes it introduced made it past multiple human and automated code reviews, as well as unit tests, end-to-end tests, automated verification, and dogfooding. Combined with this only happening in a corner case (stale sessions) and the difficulty of reproducing the issue, it took us over a week to discover and confirm the root cause.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">As part of the investigation, we back-tested <a href=\"https:\/\/code.claude.com\/docs\/en\/code-review\" rel=\"nofollow noopener\" target=\"_blank\">Code Review<\/a> against the offending pull requests using Opus 4.7. When provided the code repositories necessary to gather complete context, Opus 4.7 found the bug, while Opus 4.6 didn&#8217;t. To prevent this from happening again, we are now landing support for additional repositories as context for code reviews.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">We fixed this bug on April 10 in v2.1.101.<\/p>\n<p>A system prompt change to reduce verbosity<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Our latest model, Claude Opus 4.7, has a notable behavioral quirk relative to its predecessor: as we <a href=\"https:\/\/www.anthropic.com\/news\/claude-opus-4-7\" rel=\"nofollow noopener\" target=\"_blank\">wrote about<\/a> at launch, it tends to be quite verbose. This makes it smarter on hard problems, but it also produces more output tokens.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">A few weeks before we released Opus 4.7, we started tuning Claude Code in preparation. Each model behaves slightly differently, and we spend time before each release optimizing the harness and product for it.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">We have a number of tools to reduce verbosity: model training, prompting, and improving thinking UX in the product. Ultimately we used all of these, but one addition to the system prompt caused an outsized effect on intelligence in Claude Code:<\/p>\n<p>\u201cLength limits: keep text between tool calls to \u226425 words. Keep final responses to \u2264100 words unless the task requires more detail.\u201d<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">After multiple weeks of internal testing and no regressions in the set of evaluations we ran, we felt confident about the change and shipped it alongside Opus 4.7 on April 16.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">As part of this investigation, we ran more ablations (removing lines from the system prompt to understand the impact of each line) using a broader set of evaluations. One of these evaluations showed a 3% drop for both Opus 4.6 and 4.7. We immediately reverted the prompt as part of the April 20 release.<\/p>\n<p>Going forward<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">We are going to do several things differently to avoid these issues: we\u2019ll ensure that a larger share of internal staff use the exact public build of Claude Code (as opposed to the version we use to test new features); and we&#8217;ll make improvements to our <a href=\"https:\/\/code.claude.com\/docs\/en\/code-review\" rel=\"nofollow noopener\" target=\"_blank\">Code Review<\/a> tool that we use internally, and ship this improved version to customers.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">We\u2019re also adding tighter controls on system prompt changes. We will run a broad suite of per-model evals for every system prompt change to Claude Code, continuing ablations to understand the impact of each line, and we have built new tooling to make prompt changes easier to review and audit. We&#8217;ve additionally added guidance to our CLAUDE.md to ensure model-specific changes are gated to the specific model they&#8217;re targeting. For any change that could trade off against intelligence, we&#8217;ll add soak periods, a broader eval suite, and gradual rollouts so we catch issues earlier.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">We recently created @ClaudeDevs on X to give us the room to explain product decisions and the reasoning behind them in depth. We&#8217;ll share the same updates in centralized threads on GitHub.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">Finally, we\u2019d like to thank our users: the people who used the \/feedback command to share their issues with us (or who posted specific, reproducible examples online) are the ones who ultimately allowed us to identify and fix these problems. Today we are resetting usage limits for all subscribers.<\/p>\n<p class=\"Body-module-scss-module__z40yvW__reading-column body-2 serif post-text\">We\u2019re immensely grateful for your feedback and for your patience.<\/p>\n","protected":false},"excerpt":{"rendered":"Over the past month, we\u2019ve been looking into reports that Claude\u2019s responses have worsened for some users. We\u2019ve&hellip;\n","protected":false},"author":2,"featured_media":608946,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[182,181,507,74],"class_list":{"0":"post-608945","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/608945","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=608945"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/608945\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/608946"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=608945"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=608945"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=608945"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}