{"id":158893,"date":"2025-11-25T14:03:09","date_gmt":"2025-11-25T14:03:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/158893\/"},"modified":"2025-11-25T14:03:09","modified_gmt":"2025-11-25T14:03:09","slug":"estimating-ai-productivity-gains-anthropic","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/158893\/","title":{"rendered":"Estimating AI productivity gains \\ Anthropic"},"content":{"rendered":"<p>Overview<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">What do real conversations with Claude tell us about the effects of AI on labor productivity? Using our privacy-preserving <a href=\"https:\/\/www.anthropic.com\/research\/clio\" rel=\"nofollow noopener\" target=\"_blank\">analysis method<\/a>, we sample one hundred thousand real conversations from <a href=\"http:\/\/claude.ai\/redirect\/website.v1.76702ece-70b8-4515-b12c-bfe8b3be5370\" rel=\"nofollow noopener\" target=\"_blank\">Claude.ai<\/a>, estimate how long the tasks in these conversations would take with and without AI assistance, and study the productivity implications across the broader economy. Based on Claude\u2019s estimates, these tasks would take on average about 90 minutes to complete without AI assistance, and Claude speeds up individual tasks by about 80%.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Extrapolating these estimates out suggests current-generation AI models could increase US labor productivity growth by 1.8% annually over the next decade\u2014roughly twice the run rate in recent years. But this isn\u2019t a prediction of the future, since we don\u2019t take into account the rate of adoption or the larger productivity effects that would come from much more capable AI systems.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Our analysis has limits. Most notably, we can\u2019t account for additional time humans spend on tasks outside of their conversations with Claude, including validating the quality or accuracy of Claude&#8217;s work. But as AI models get better at time estimation, we think our methods in this research note could become increasingly useful for understanding how AI is shaping real work.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Here\u2019s a more detailed summary of our results:<\/p>\n<p>Across one hundred thousand real world conversations, Claude estimates that AI reduces task completion time by 80%. We use Claude to evaluate anonymized Claude.ai transcripts to estimate the productivity impact of AI. According to Claude\u2019s estimates, people typically use AI for complex tasks that would, on average, take people 1.4 hours to complete. By matching tasks to O*NET occupations and BLS wage data, we estimate these tasks would otherwise cost $55 in human labor.The estimated scope, cost, and time savings of tasks varies widely by occupation. Based on Claude\u2019s estimates, people use Claude for legal and management tasks that would have taken nearly two hours, but for food preparation tasks that would have taken only 30 minutes. And we find that healthcare assistance tasks can be completed 90% more quickly, whereas hardware issues see time savings of 56%. This doesn\u2019t account for the time that humans might spend on these tasks beyond their conversation on <a href=\"http:\/\/claude.ai\/redirect\/website.v1.76702ece-70b8-4515-b12c-bfe8b3be5370\" rel=\"nofollow noopener\" target=\"_blank\">Claude.ai<\/a>, however, so we think these estimates might overstate current productivity effects to at least some degree.Extrapolating these results to the economy, current generation AI models could increase annual US labor productivity growth by 1.8% over the next decade. This would double the annual growth the US has seen since 2019, and places our estimate towards the upper end of <a href=\"https:\/\/www.oecd.org\/content\/dam\/oecd\/en\/publications\/reports\/2024\/11\/miracle-or-myth-assessing-the-macroeconomic-productivity-gains-from-artificial-intelligence_fde2a597\/b524a072-en.pdf\" rel=\"nofollow noopener\" target=\"_blank\">recent estimates<\/a>. Taking as given Claude\u2019s estimates of task-level efficiency gains, we use standard methods to calculate a 1.8% implied annual increase in US labor productivity over the next ten years. However, this estimate does not account for future improvements in AI models (or more sophisticated uses of current technology), which could significantly magnify AI\u2019s economic impact.As AI accelerates some tasks, others may become bottlenecks: We see large speedups for some tasks and much smaller ones in others, even within the same occupational groups. Where AI makes less of a difference, these tasks might become bottlenecks, potentially acting as a constraint on growth.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">This gives us a new lens for understanding how AI\u2019s economic impacts over time, which we will track going forward as part of our <a href=\"https:\/\/www.anthropic.com\/economic-index\" rel=\"nofollow noopener\" target=\"_blank\">Economic Index<\/a>: Computing these estimates based on real-world Claude conversations gives us a new lens to understand AI productivity. This complements other approaches, like lab studies in narrow domains, or government statistics which provide more coarse-grained insights. We will track how these estimates change over time to get an evolving picture of these issues as capabilities and adoption continue to progress.<\/p>\n<p><img loading=\"lazy\" width=\"9168\" height=\"5160\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079387_99_image\"\/>An overview of our method and some of our main results. See below for how we validate Claude\u2019s estimates, the assumptions we make, and limitations of our analysis.Introduction<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">As part of the <a href=\"https:\/\/www.anthropic.com\/economic-index\" rel=\"nofollow noopener\" target=\"_blank\">Anthropic Economic Index<\/a>, we have documented how people use Claude across different tasks, industries, and places. We\u2019ve captured the breadth of uses\u2014how people use Claude for legal, scientific, and programming tasks\u2014but not their depth. How substantial are the tasks for which people use Claude, and how much time does Claude save them?<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">The current version of the Economic Index can&#8217;t capture this within-task heterogeneity\u2014for instance, it can\u2019t distinguish report-writing tasks that take five minutes from those that take five days, or financial modeling tasks that take an afternoon from those that take a few weeks. This makes it difficult to assess AI&#8217;s economic effects: a software developer might use Claude to write ten pull requests in a day, but if nine are minor documentation updates and one is a critical infrastructure change, simply counting the number of these tasks performed with Claude misses the point.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Not only that, but as model capabilities improve, we want to understand whether they do higher-value work. To understand how AI is reshaping work and productivity, we need to know not just which tasks Claude handles, but how substantial those tasks and time savings are.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Several groups have begun conducting randomized controlled trials to measure productivity gains in narrow domains, including <a href=\"https:\/\/arxiv.org\/abs\/2302.06590\" rel=\"nofollow noopener\" target=\"_blank\">software<\/a> <a href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=4945566\" rel=\"nofollow noopener\" target=\"_blank\">engineering<\/a> <a href=\"https:\/\/metr.org\/blog\/2025-07-10-early-2025-ai-experienced-os-dev-study\/\" rel=\"nofollow noopener\" target=\"_blank\">tasks<\/a>, <a href=\"https:\/\/www.science.org\/doi\/10.1126\/science.adh2586\" rel=\"nofollow noopener\" target=\"_blank\">writing<\/a>, and <a href=\"https:\/\/www.nber.org\/papers\/w31161\" rel=\"nofollow noopener\" target=\"_blank\">customer service<\/a>. METR&#8217;s work on <a href=\"https:\/\/metr.org\/blog\/2025-03-19-measuring-ai-ability-to-complete-long-tasks\/\" rel=\"nofollow noopener\" target=\"_blank\">measuring AI\u2019s ability to complete long tasks<\/a> has demonstrated that AI systems can independently tackle extended, multi-step challenges. But these evaluations consider a narrow set of problems, rather than broad real-world use. To assess AI\u2019s overall impact on the economy, we need a way to analyze hundreds or thousands of real-world AI applications.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">This report takes a first step toward that goal. It uses Claude to estimate how much time it would take a human to complete the tasks that Claude handles, compares that to how long Claude and the human took together, and thereby calculates how much time the AI has saved. While AI models lack context about users&#8217; expertise, workflows, and constraints, we find that model-estimated times show promising accuracy for a dataset of software engineering tasks, relative to both human-estimated completion times and time-tracked outcomes.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">In what follows, we present our methodology for estimating task-level time savings, validate our approach against ground-truth data, and then use these estimates to assess which tasks and occupations show the largest productivity gains from AI. We then explore what our task-level estimates imply about aggregate productivity as AI begins to be adopted throughout the economy.<\/p>\n<p>Estimating task length and time savings<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Using our <a href=\"https:\/\/www.anthropic.com\/research\/clio\" rel=\"nofollow noopener\" target=\"_blank\">privacy-preserving analysis system<\/a>, we analyzed 100,000 conversation transcripts from Claude.ai (Free, Pro, and Max tiers) to measure the length and time savings of tasks Claude handles. We generated two core estimates for each task:<\/p>\n<p>Time estimate without AI: The hours a human professional would need to complete the task without AI assistanceTime estimate with AI: The amount of time it took to complete the task with AI assistance<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">We used Claude to generate these estimates for each conversation. Following our <a href=\"https:\/\/www.anthropic.com\/news\/the-anthropic-economic-index\" rel=\"nofollow noopener\" target=\"_blank\">Economic Index methodology<\/a>, we then aggregated these individual chat conversations to tasks in the <a href=\"https:\/\/www.onetcenter.org\/database.html\" rel=\"nofollow noopener\" target=\"_blank\">O*NET<\/a> taxonomy by taking the median of time estimates for each task. This allowed us to explore how such time estimates vary across tasks and occupations within the economy. Classification prompts are in the Appendix.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Analyzing real-world transcripts enables us to account for intra-task variation. For instance, even if the overall share of designing manufacturing equipment tasks stays fixed, transcript-level information lets us see whether people tackle more complex, longer-timescale projects (or attain greater time savings) with AI over time. Our <a href=\"https:\/\/www.anthropic.com\/economic-index\" rel=\"nofollow noopener\" target=\"_blank\">Economic Index <\/a>will track how these estimates evolve over time, and share aggregate datasets that researchers can use to make their own forecasts and conclusions.<\/p>\n<p>Validation<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Estimating task duration is <a href=\"https:\/\/web.mit.edu\/curhan\/www\/docs\/Articles\/biases\/67_J_Personality_and_Social_Psychology_366,_1994.pdf\" rel=\"nofollow noopener\" target=\"_blank\">notoriously difficult for humans<\/a>. AI models have an even more difficult job, since they lack crucial context about the broader context of tasks (though we expect this context to increase over time as features like <a href=\"https:\/\/www.anthropic.com\/news\/memory\" rel=\"nofollow noopener\" target=\"_blank\">memory<\/a> and <a href=\"https:\/\/www.anthropic.com\/news\/claude-and-slack\" rel=\"nofollow noopener\" target=\"_blank\">external integrations<\/a> become more comprehensive). To assess whether Claude\u2019s estimates are informative, we conducted two validation analyses.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Self-consistency testing: First, we assess whether Claude produces stable estimates of task lengths across different conversation samples, or across variations in our prompts.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">We create multiple prompt variations\u2014for example, asking about an &#8220;employee with appropriate skills&#8221; versus a &#8220;skilled professional&#8221;\u2014to assess how sensitive estimates are to the way the prompt is phrased. We analyze 1,800 conversations with each variant, where users consented to share these conversations with us, and computed correlations across prompt variants. The results showed strong self-agreement, with log-scale correlations of r=0.89\u20130.93 across variants.<\/p>\n<p><img loading=\"lazy\" width=\"9168\" height=\"5164\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079387_553_image\"\/>Claude\u2019s estimated human completion times show high correlation across prompt variations. Prompt 1 asks Claude to estimate the time it would take an &#8220;employee with appropriate skills&#8221; to complete and Prompt 2 asks about a \u201chuman worker\u201d who is \u201ccompetent in the relevant field.\u201d The two prompts show a log-scale correlation of 0.89, indicating high agreement. Analysis performed on <a href=\"http:\/\/claude.ai\" rel=\"nofollow noopener\" target=\"_blank\">Claude.ai<\/a> transcripts where users have consented to share them with us for research purposes.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">External benchmarking: Self-agreement doesn\u2019t matter much if a model\u2019s predictions don\u2019t correspond well to reality. To check this, we tested Claude&#8217;s time estimation capabilities against a dataset of thousands of real-world <a href=\"https:\/\/zenodo.org\/records\/7022735\" rel=\"nofollow noopener\" target=\"_blank\">software development tasks<\/a> gathered from JIRA tickets for open-source repositories, with both developer estimates and actual tracked completion times.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">This is a very challenging task for Claude, given that Claude receives only the title and description of the JIRA tickets, while the human developers have full context on the codebase and the ticket, and have seen how long similar tasks take to complete. On a subset of 1000 tasks from this benchmark:<\/p>\n<p>Human developers themselves achieved \u03c1=0.50 Spearman correlation with actual times, and a Pearson correlation of r_log=0.67 on the log values, indicating a moderate-strength correlation (higher is better for both values).Claude Sonnet 4.5 achieved \u03c1=0.44 and r_log=0.46Claude Sonnet 4.5 with ten examples of tasks and their ground-truth time lengths showed a worse \u03c1=0.39, but improved r_log=0.48<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">This analysis suggests that Claude\u2019s estimates provide directional information that is only slightly worse than software developers\u2019 own estimates. However, we observe that Claude\u2019s estimates are much more compressed than humans\u2014predicting comparatively long times for shorter tasks, and vice versa\u2014and are overall more prone to overestimates. This suggests that the actual differences in task lengths across tasks may be larger than we report, and that actual task lengths may be slightly shorter. Overall, these findings demonstrate that model predictions have meaningful correlation with real-world outcomes, at least in this domain, making them useful for comparing one task to another or tracking changes over time. We also observe higher correlation from Claude Sonnet 4.5 compared to Claude Sonnet 4, suggesting that these estimates may continue to improve with model capabilities.<\/p>\n<p><img loading=\"lazy\" width=\"2762\" height=\"1020\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079387_500_image\"\/>Correlation of actual time spent on software engineering tasks with developer and Claude estimates. Left: correlation with developers\u2019 initial time estimates with the final time-tracked outcomes. Developers are familiar with the full codebase and understand the full context behind the request and how long similar tasks have taken. Middle: correlation with Claude Sonnet 4.5\u2019s estimates, given just the task title and description of the JIRA ticket. Right: Correlation with Claude Sonnet 4.5\u2019s estimates, given 10 examples in the prompt to calibrate on. Overall, Claude\u2019s estimates have similar directional correlation to developers: Spearman\u2019s \u03c1=0.44, compared to \u03c1=0.50 for developers, though Claude significantly overestimates short tasks and underestimates long ones. Axes are log (base 10) scaled. Error bars are 95% CIs per bin.Results<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">We first use the methods above to estimate task-level savings, then aggregate these into estimates of economy-wide effects.<\/p>\n<p>Task-level savings<img loading=\"lazy\" width=\"9168\" height=\"7336\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079388_260_image\"\/>Claude\u2019s estimated task time, average hourly wage of the occupation, implied task cost, and time savings for nine different tasks. Task time is estimated by having Claude predict how long a professional would take to perform the task without AI assistance. Hourly wage is derived from the Occupational Employment and Wage Statistics (OEWS) May 2024 data. Task cost is computed by multiplying the task time by the hourly wage. Time savings is computed by estimating the time the human took to complete the task and computing 1 &#8211; time_with_ai \/ time_without_ai.Example tasks demonstrate a range of time savings<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Looking at individual tasks within occupations provides concrete examples of where and how AI might be delivering time savings. At the most extreme end, we see users complete curriculum development tasks that Claude thinks would take 4.5 hours in just 11 minutes. Such tasks have an implied labor cost of $115 based on the average hourly wage of teachers.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">People also use AI to save 87% of the time it would take to write invoices, memos, and other documents (at least for the type of documents Claude is asked to handle). Finally, AI saves 80% of time on financial analyst tasks like interpreting financial data for tasks that would ordinarily cost $31 in wages.<\/p>\n<p>Task length varies dramatically across occupations<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Human time estimates show that Claude handles tasks of very different lengths depending on the occupation. In the below plots, we show averages for each occupation category among the subset of tasks that Claude is used for1. The average management task where Claude is used (e.g. selecting investments) is estimated to take humans 2.0 hours to complete, followed by legal (1.8 hours), education (1.7), and arts\/media tasks (1.6). At the other end of the spectrum, food preparation tasks (e.g. planning or pricing menu items), installation\/maintenance, and transportation tasks all take 0.3-0.5hrs on average, suggesting more circumscribed tasks, or tasks with less waiting time. Given that Claude\u2019s time estimates tend to underestimate long tasks and overestimate short tasks, it is possible that these differences might be even greater in practice.<\/p>\n<p><img loading=\"lazy\" width=\"9168\" height=\"5160\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079388_129_image\"\/>Various figures derived from Claude\u2019s time estimates for SOC major groups. Human time estimates vary substantially across occupations \u2014 People use Claude for management and legal tasks estimated to take humans around 2h unassisted, while healthcare support and food prep tasks average around a half-hour. Average hourly wage for the occupational category is retrieved from OEWS 2024 data. Average task cost is computed by multiplying each occupation\u2019s hourly wage by its median task time and computing an average weighted by each task\u2019s prevalence in our sample. Time savings are computed via 1 &#8211; time_with_ai \/ time_without_ai.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Cost estimates amplify this variation in the impact of AI: the tasks with the longest time estimates also tend to be the tasks with the highest labor costs. We compute these cost estimates by multiplying the median time for each task by the associated occupation\u2019s average wage in the OEWS May 2024 data. The average management task would cost $133 for a professional compared to $119 for legal tasks and $8 for tasks relating to food preparation and serving. Business and financial tasks average $69 while computer and mathematical tasks average $82.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Across all tasks we observe, we estimate Claude handles work that would cost a median of $54 in professional labor to hire an expert to perform the work in each conversation. Of course, the actual performance of current models will likely be worse than a human expert for many tasks, though recent research suggests <a href=\"https:\/\/openai.com\/index\/gdpval\/\" rel=\"nofollow noopener\" target=\"_blank\">the gap is closing<\/a> across a wide range of different applications.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Across major occupational groups, we observe a positive correlation between average hourly wage among tasks\/occupations in our sample and the human-time-equivalent duration of the tasks Claude is asked to handle. For example, the Management and Legal occupational categories rank at the top of the classification in terms of average hourly wage\u2014aligning with Claude\u2019s strengths in complex knowledge work.<\/p>\n<p><img loading=\"lazy\" width=\"9168\" height=\"5160\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079388_469_image\"\/>Correlation between average hourly wage of an occupation category and the average Claude-estimated task duration in our sample. Higher-wage occupations categories (e.g. Management and Legal) have tasks with more complex usage in our sample (r=0.8).Time savings are highly uneven across occupations<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Our human time and cost estimates capture the magnitude of tasks people tackle with AI. But the time savings\u2014Claude\u2019s estimate for how much faster work gets done with AI\u2014reflects the productivity gains that might come from using AI for those tasks.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">The median conversation experienced an estimated 84% time savings, though we see considerable variation across tasks and categories. For example, the task of checking diagnostic images only shows 20% time savings, likely because this is already a task that can be done quickly by experts without AI assistance. By contrast, the task of compiling information from reports sees approximately 95% time savings, likely because AI systems can read, extract, and cite information much more quickly than people. Overall, the distribution of time saved by task is concentrated within the 50-95% range, peaking between 80-90%.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">These large time savings align with Claude\u2019s abilities to read and write far faster than people can. However, our approach doesn\u2019t take into account the additional work people need to do to refine Claude\u2019s outputs to a finished state, or whether they continue iterating on the work product across multiple sessions\u2014both of which would result in smaller time savings. Past randomized controlled trials have typically found smaller time savings, including <a href=\"https:\/\/arxiv.org\/abs\/2302.06590\" rel=\"nofollow noopener\" target=\"_blank\">56%<\/a>, <a href=\"https:\/\/www.science.org\/doi\/10.1126\/science.adh2586\" rel=\"nofollow noopener\" target=\"_blank\">40%<\/a>, <a href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=4945566\" rel=\"nofollow noopener\" target=\"_blank\">26%<\/a>, <a href=\"https:\/\/www.nber.org\/papers\/w31161\" rel=\"nofollow noopener\" target=\"_blank\">14%<\/a> and even <a href=\"https:\/\/metr.org\/blog\/2025-07-10-early-2025-ai-experienced-os-dev-study\/\" rel=\"nofollow noopener\" target=\"_blank\">negative<\/a> time savings across different applications\u2014perhaps due to these effects or because these studies examined earlier generations of models.<\/p>\n<p><img loading=\"lazy\" width=\"2292\" height=\"1291\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079388_922_image\"\/>Density plot of time savings across O*NET tasks in our sample. We see that Claude\u2019s estimated time savings are uneven across tasks in our sample, with most falling between 50 and 95%. The overall median savings is 81%. Time savings are computed by 1 &#8211; time_with_ai \/ time_without_ai. Our estimates do not take into account the time spent refining Claude\u2019s output outside of the chat window.From task-level efficiency gains to economy-wide productivity effects<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">The above estimates capture AI-driven productivity gains at the task level. To understand macro-level impacts, this section models how these gains could aggregate across the entire economy, assuming they play out according to Claude\u2019s estimates.<\/p>\n<p>Methodology<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">To estimate economy-wide productivity effects, we use Hulten\u2019s theorem, a standard method that allows us to aggregate efficiency gains at the task-level to the broader US economy2. As in <a href=\"https:\/\/economics.mit.edu\/sites\/default\/files\/2024-04\/The%20Simple%20Macroeconomics%20of%20AI.pdf\" rel=\"nofollow noopener\" target=\"_blank\">Acemoglu (2024)<\/a>\u2019s \u201cbaseline\u201d approach, we model the implied increase in labor productivity as a weighted average over task-level productivity gains\u2014a modeling choice that implicitly assumes that capital investment will increase as a result of an increase in total factor productivity (TFP) associated with AI adoption. In this framework, the implied increase TFP is the gain in labor productivity multiplied by the labor share of income3.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Task composition: For each occupation, we obtain a list of work tasks from O*NET. We then use Claude to estimate what fraction of workers&#8217; time is spent on each of those tasks. For example, Claude estimates that programmers spend 23% of their time writing and maintaining code, 15% analyzing and rewriting programs, and smaller fractions on testing, documentation, and meetings.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Task-level productivity improvements: In the previous section, we provided estimates we can use to compute how much more quickly each task is completed with AI assistance. We take the log difference between time without AI and the time with AI to generate a productivity improvement value, and conservatively assign tasks not observed in our sample a null improvement.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Economy-wide estimate: We weight each task&#8217;s implied productivity gains by its economic importance using two factors: (i) the fraction of time that Claude estimates the occupation spends on that task (as above), and (ii) the occupation&#8217;s share of the total US wage bill (the number of people employed in that occupational category multiplied by the average wage, then divided by total wage bill across all occupations). For the total wage bill, we use May 2024 OEWS data. This approach implicitly assumes the time estimates that Claude produces represent reliable averages across all instances of each task, and that Claude or similar AI systems will be adopted across the entire US economy.<\/p>\n<p><img loading=\"lazy\" width=\"9168\" height=\"5164\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079389_834_image\"\/>US economy-wide labor productivity impact: top ten occupations. Overall, Claude\u2019s estimates imply a 1.8% annualized increase (dotted line) in US labor productivity assuming current AI systems were adopted universally for all tasks we observe, driven by software, management, marketing, and customer service tasks. This corresponds to an implied 1.08% annualized increase in TFP. The average ln(time estimate ratio) represents the time-weighted productivity gain across all tasks in each occupation, where time estimate ratio = time with AI \/ time without AI. Labor statistics derived from OEWS 2024 data.Findings<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Assuming 10 years for AI to reach universal adoption across the US economy\u2014and using current models\u2014we calculate that Claude\u2019s estimates imply an annual increase in US labor productivity of 1.8%. This would nearly double the current long-term growth rate, which has averaged 2.1% per year since 1947 and 1.8% since 2019. Assuming that labor\u2019s share of total factor productivity is 0.64, this implies an overall total factor productivity increase of 1.1% per year. Given that TFP growth has tended to be less than 1% since the early 2000s, these estimates suggest that, even broad deployment of current AI systems could cause growth to double: achieving the rates of the late 1990s, and of the 1960s and 1970s5.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">This estimated increase in aggregate labor productivity implied by task-level efficiency gains is within the range of recent estimates of AI\u2019s potential impact on productivity, though it lies towards the upper end (Filippucci, Gal, and Schief, 2024).<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Importantly, this exercise assumes that AI capabilities (and humans\u2019 effectiveness in using AI) remain the same over the next 10 years as when we took our sample. This, though, seems unlikely to hold: we think that AI will <a href=\"https:\/\/www.anthropic.com\/news\/core-views-on-ai-safety\" rel=\"nofollow noopener\" target=\"_blank\">continue to improve rapidly<\/a> over the coming years.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Therefore, this estimate should be taken as an exercise exploring what might happen based on current usage patterns, not a prediction of the impact on productivity that is actually most likely to happen. As we have written about in other work, we remain extremely alert to the possibility that AI causes significant labor market disruptions, which would likely be associated with larger increases in productivity due to AI. As models progress, this could represent an approximate lower bound on the productivity effects of AI, although our estimate does not account for unevenness in adoption, which might reduce real-world productivity gains in the short term.<\/p>\n<p><img loading=\"lazy\" width=\"9168\" height=\"5160\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079389_187_image\"\/>Labor productivity growth in the nonfarm business sector. The chart shows five year moving averages of the year-over-year percent change in labor productivity. We see a general decline from almost 3% in the 1960s to around 1.5% the last few years.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Reflecting the fact that <a href=\"https:\/\/assets.anthropic.com\/m\/218c82b858610fac\/original\/Economic-Index.pdf\" rel=\"nofollow noopener\" target=\"_blank\">some tasks and occupations appear much more frequently in our data than others<\/a>, we observe similar phenomena in occupations\u2019 contributions to labor productivity as well. Software developers contribute most (19%) to the total labor productivity gain attributable to AI. General and Operations Managers (about 6%), Market Research Analysts and Marketing Specialists (5%), Customer Service Representatives (4%) and Secondary School Teachers (3%) round out the top five.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">In contrast, restaurants, healthcare delivery, construction, and retail contribute much less to the overall productivity effect. This is mostly because few of their tasks appear in our data\u2014largely because these occupations have few associated tasks in our sample.<\/p>\n<p>How might AI change how workers spend their time?<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">If workers are able to accelerate a subset of their occupational tasks with AI, the tasks where AI provides less speedup may come to represent a larger and thus more important share of those occupations\u2019 work. For example, AI might help a home inspector prepare reports, but if the inspector still has to spend the same amount of time physically traveling to the property to perform the inspection in person, this could make inspections a greater fraction of the job overall.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">The figure below illustrates this for a few occupations. For software developers, AI speeds up the process of software development, testing, documentation, and manipulating data. But we do not currently see meaningful AI use for coordinating system installation or supervising the work of other technologists or engineers. For teachers, we see that AI assists with lesson and activity planning, but not with sponsoring extracurricular clubs or enforcing rules in the classroom.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">From a growth perspective, these observations align well with a recent observation from <a href=\"https:\/\/www.nber.org\/papers\/w23928\" rel=\"nofollow noopener\" target=\"_blank\">Aghion, Jones, and Jones<\/a>: \u201cgrowth may be constrained not by what we are good at but rather by what is essential and yet hard to improve.\u201d<\/p>\n<p><img loading=\"lazy\" width=\"9168\" height=\"8096\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079389_991_image\"\/>Four different occupations along with \u201caccelerated\u201d tasks that show large potential time savings, and potential \u201cbottleneck\u201d tasks that do not appear in our sample. For example, software engineers see large estimated time savings in developing and debugging software, but not in supervising programmers. Weekly time fractions are estimated by Claude (see previous section).Limitations<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Our approach has several limitations that we think warrant further research on this topic:<\/p>\n<p>Claude\u2019s predictions are imperfect and we lack real-world validation of Claude\u2019s time estimates: AI systems are imperfect predictors, and can\u2019t see activity that happens after the user finishes their interaction with the model. While we expect these estimates will improve with models capabilities, using model estimates introduces a significant source of noise. While our estimates show that models are approaching human performance at estimating task times, and humans are far from perfect themselves, we lack real-world data to validate the estimates that Claude provides.Task taxonomy limitations: Real jobs are more complex than an O*NET task list, and the time allocations we estimate for each task are only approximate. Many important aspects of work\u2014tacit knowledge, relationships, judgment under uncertainty\u2014don&#8217;t appear in these formal task descriptions, and the connections between tasks may matter just as much or more to productivity as the time savings for those tasks in isolation. While we show large predicted time savings for individual tasks, <a href=\"https:\/\/metr.org\/blog\/2025-07-10-early-2025-ai-experienced-os-dev-study\/\" rel=\"nofollow noopener\" target=\"_blank\">a recent randomized controlled trial<\/a> studying end-to-end software features did not see time savings due to AI.Structural assumptions: In our calculations above, we compare the time it would take a professional to complete a given task without AI to the time it took with AI. But this could either understate the productivity gains \u2013 since it takes additional resources we\u2019re not accounting for to hire an employee and communicate context, and possibly overstate it, if the quality of the AI\u2019s work is worse than a human\u2019s.Restructuring of organizations: Historically, the largest productivity gains for individual firms have followed from <a href=\"https:\/\/www.jstor.org\/stable\/2006600\" rel=\"nofollow noopener\" target=\"_blank\">restructuring business operations<\/a> to adopt new technologies. Our model can help predict the effects of such a restructuring, but it cannot predict how companies might decide to restructure, or how quickly this process might happen.The role of innovation: Technological innovation is <a href=\"https:\/\/www.jstor.org\/stable\/1926047\" rel=\"nofollow noopener\" target=\"_blank\">the engine<\/a> of economic growth. Our model does not capture how AI systems could accelerate or even automate the scientific process, nor the effects that would have on productivity, growth, and the structure of work.Limited data: Our dataset is derived from <a href=\"http:\/\/claude.ai\/redirect\/website.v1.76702ece-70b8-4515-b12c-bfe8b3be5370\" rel=\"nofollow noopener\" target=\"_blank\">Claude.ai<\/a> conversations only. This sample is not representative of the full spectrum of AI uses, and there\u2019s likely some selection effect where the instances of tasks people use Claude for are the ones they think Claude will be most useful. Additionally, due to our finite sample size, we likely miss some less common AI tasks.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">The measurement infrastructure we develop here enables continuous tracking of the effect of AI on time savings at large scale. As models improve and better methods address these limitations, we can re-estimate these time savings and identify how these capability improvements translate into broader economic impacts. We expect to track these changes in the months and years ahead.<\/p>\n<p>Conclusion<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Claude handles tasks of widely varying complexity\u2014from simple food preparation questions that would take a few minutes to complete, to complex legal and management tasks that would take multiple hours. But what is the aggregate effect of this work?<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Based on Claude\u2019s time estimates per task (and assuming universal adoption over the next 10 years), we find that use of current models implies a potential increase in US labor productivity of 1.8% per year\u2014a doubling of the recent rate of labor productivity growth. Based on current AI use, these gains would be concentrated in technology, education, and professional services, while retail, restaurants, and transportation sectors would see minimal impact. We\u2019ll be tracking these changes over time as part of our <a href=\"https:\/\/www.anthropic.com\/economic-index\" rel=\"nofollow noopener\" target=\"_blank\">Economic Index<\/a> as model capabilities, products, and adoption continue to progress.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">These productivity gains come from making existing tasks faster to complete. Historically, though, transformative productivity improvements\u2014from electrification, computing, or the internet\u2014came not from speeding up old tasks, but from fundamentally reorganizing production. In futures like these, AI not only makes implementing features faster, but companies restructure meetings and code review to validate and ship those features faster, whether using AI or through other means.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Our framework could be used to help estimate the effects of such restructuring, but it cannot predict which changes will occur, or how quickly. An important direction for future work is understanding this question\u2014to get a better understanding of when and how firms are reorganizing themselves around emerging AI capabilities. The answer will determine when AI makes the jump from providing significant but bounded productivity boosts, to representing the kind of structural transformation that has historically defined technological revolutions.<\/p>\n<p>Bibtex<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">If you&#8217;d like to cite this post, you can use the following Bibtex key:<\/p>\n<p>@online{tamkinmccrory2025productivity,<br \/>\nauthor = {Alex Tamkin and Peter McCrory},<br \/>\ntitle = {Estimating AI productivity gains from Claude conversations},<br \/>\ndate = {2025-11-05},<br \/>\nyear = {2025},<br \/>\nurl = {https:\/\/www.anthropic.com\/research\/estimating-productivity-gains},<br \/>\n}AppendixComparison of Claude\u2019s estimates to other estimates<img loading=\"lazy\" width=\"9168\" height=\"5164\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764079389_980_image\"\/>Predicted increase in annual labor productivity growth over a 10-year horizon due to AI. Figure reproduced from Filippucci, Gal, and Schief, 2024. The dotted line is 1.8%, derived from Claude\u2019s estimates.Prompts used for our time estimates<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Human time estimation prompt<\/p>\n<p>Human: Consider the following conversation:<\/p>\n<p>{{TRANSCRIPT}}<\/p>\n<p>Estimate how many hours a competent professional would need to complete the tasks done by the Assistant.<br \/>\nAssume they have:<br \/>\n&#8211; The necessary domain knowledge and skills<br \/>\n&#8211; All relevant context and background information<br \/>\n&#8211; Access to required tools and resources<\/p>\n<p>Before providing your final answer, use  tags to break down your reasoning process:<\/p>\n<p>2-5 sentences of reasoning estimating how many hours would be needed to complete the tasks.<\/p>\n<p>Provide your output in the following format:<br \/>\nA number representing hours (can use decimals like 0.5 for shorter tasks)<\/p>\n<p>Assistant: <\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Interaction time estimation prompt<\/p>\n<p>Human: Consider the following conversation:<\/p>\n<p>{{TRANSCRIPT}}<\/p>\n<p>Estimate how many minutes the user spent completing the tasks in the prompt with the model.<br \/>\nConsider:<br \/>\n&#8211; Number and complexity of human messages<br \/>\n&#8211; Time reading Claude&#8217;s responses<br \/>\n&#8211; Time thinking and formulating questions<br \/>\n&#8211; Time reviewing outputs and iterating<br \/>\n&#8211; Realistic typing\/reading speeds<br \/>\n&#8211; Time implementing suggestions or running code outside of the converesation (only if directly relevant to the tasks)<\/p>\n<p>Before providing your final answer, use  tags to break down your reasoning process:<\/p>\n<p>2-5 sentences of reasoning about how many minutes the user spent.<\/p>\n<p>Provide your output in the following format:<br \/>\nA number representing minutes<\/p>\n<p>Assistant: <\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Software development time estimation prompt<\/p>\n<p>Human: You are estimating software development tasks for open-source projects. Provide ONLY a number in hours (e.g., 0.3, 1.6, 15). Do not explain.<br \/>\nTask: {task}<br \/>\nDescription: {description}:<br \/>\nEstimate (hours):<br \/>\nAssistant:<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Task time estimation prompt<\/p>\n<p>You are estimating how much time workers in the occupation &#8220;{occupation_title}&#8221; spend on each of their job tasks.<\/p>\n<p>Below is the complete list of tasks for this occupation. For each task, estimate how many hours per week a typical worker spends on it.<\/p>\n<p>Important: Don&#8217;t worry about making the hours sum to exactly 40 or any specific total &#8211; we&#8217;ll normalize the results afterward. Just give your best estimate for each task independently based on what seems realistic.<\/p>\n<p>Tasks:<br \/>\n{tasks}<\/p>\n<p>Return ONLY a JSON object mapping each task_id to your estimated hours per week, with no additional text, explanations, or commentary. Format:<br \/>\n{{<br \/>\n  &#8220;task_id_1&#8221;: hours,<br \/>\n  &#8220;task_id_2&#8243;: hours,<br \/>\n  &#8230;<br \/>\n}}&#8221;&#8221;&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"Overview What do real conversations with Claude tell us about the effects of AI on labor productivity? Using&hellip;\n","protected":false},"author":2,"featured_media":158894,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[220,218,219,61,60,80],"class_list":{"0":"post-158893","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ie","12":"tag-ireland","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/158893","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=158893"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/158893\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/158894"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=158893"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=158893"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=158893"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}