{"id":203506,"date":"2025-10-05T15:15:08","date_gmt":"2025-10-05T15:15:08","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/203506\/"},"modified":"2025-10-05T15:15:08","modified_gmt":"2025-10-05T15:15:08","slug":"the-reinforcement-gap-or-why-some-ai-skills-improve-faster-than-others","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/203506\/","title":{"rendered":"The Reinforcement Gap \u2014 or why some AI skills improve faster than others \u00a0"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">AI coding tools are getting better fast. If you don\u2019t work in code, it can be hard to notice how much things are changing, but GPT-5 and Gemini 2.5 have made a whole new set of developer tricks possible to automate, and last week Sonnet 2.4 did it again.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">At the same time, other skills are progressing more slowly. If you are using AI to write emails, you\u2019re probably getting the same value out of it you did a year ago. Even when the model gets better, the product doesn\u2019t always benefit \u2014 particularly when the product is a chatbot that\u2019s doing a dozen different jobs at the same time. AI is still making progress, but it\u2019s not as evenly distributed as it used to be.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The difference in progress is simpler than it seems. Coding apps are benefitting from billions of easily measurable tests, which can train them to produce workable code. This is reinforcement learning (RL), arguably the biggest driver of AI progress over the past six months and <a href=\"https:\/\/techcrunch.com\/2025\/09\/21\/silicon-valley-bets-big-on-environments-to-train-ai-agents\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">getting more intricate all the time<\/a>. You can do reinforcement learning with human graders, but it works best if there\u2019s a clear pass-fail metric, so you can repeat it billions of times without having to stop for human input.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">As the industry relies increasingly on reinforcement learning to improve products, we\u2019re seeing a real difference between capabilities that can be automatically graded and the ones that can\u2019t. RL-friendly skills like bug-fixing and competitive math are getting better fast, while skills like writing make only incremental progress.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">In short, there\u2019s a reinforcement gap \u2014 and it\u2019s becoming one of the most important factors for what AI systems can and can\u2019t do.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">In some ways, software development is the perfect subject for reinforcement learning. Even before AI, there was a whole sub-discipline devoted to testing how software would hold up under pressure \u2014 largely because developers needed to make sure their code wouldn\u2019t break before they deployed it. So even the most elegant code still needs to pass through unit testing, integration testing, security testing, and so on. Human developers use these tests routinely to validate their code and,\u00a0<a href=\"https:\/\/techcrunch.com\/2025\/09\/23\/how-googles-dev-tools-manager-makes-ai-coding-work\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">as Google\u2019s senior director for dev tools recently told me<\/a>, they\u2019re just as useful for validating AI-generated code. Even more than that, they\u2019re useful for reinforcement learning, since they\u2019re already systematized and repeatable at a massive scale.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">There\u2019s no easy way to validate a well-written email or a good chatbot response; these skills are inherently subjective and harder to measure at scale. But not every task falls neatly into \u201ceasy to test\u201d or \u201chard to test\u201d categories. We don\u2019t have an out-of-the-box testing kit for quarterly financial reports or actuarial science, but a well-capitalized accounting startup could probably build one from scratch. Some testing kits will work better than others, of course, and some companies will be smarter about how to approach the problem. But the testability of the underlying process is going to be the deciding factor in whether the underlying process can be made into a functional product instead of just an exciting demo.\u00a0\u00a0<\/p>\n<p>Techcrunch event<\/p>\n<p>\n\t\t\t\t\t\t\t\t\tSan Francisco<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t|<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\tOctober 27-29, 2025\n\t\t\t\t\t\t\t<\/p>\n<p class=\"wp-block-paragraph\">Some processes turn out to be more testable than you might think. If you\u2019d asked me last week, I would have put AI-generated video in the \u201chard to test\u201d category, but the immense progress made by <a href=\"https:\/\/techcrunch.com\/2025\/09\/30\/openai-is-launching-the-sora-app-its-own-tiktok-competitor-alongside-the-sora-2-model\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI\u2019s new Sora 2 model<\/a> shows it may not be as hard as it looks. In Sora 2, objects no longer appear and disappear out of nowhere. Faces hold their shape, looking like a specific person rather than just a collection of features. Sora 2 footage respects the laws of physics in both <a href=\"https:\/\/www.reddit.com\/r\/singularity\/comments\/1nunil1\/sora_2_vs_veo_3_on_physics_test_sora_2_is_first\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">obvious<\/a> and <a href=\"https:\/\/x.com\/pallavmac\/status\/1973141663557226806\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">subtle<\/a> ways. I suspect that, if you peeked behind the curtain, you\u2019d find a robust reinforcement learning system for each of these qualities. Put together, they make the difference between photorealism and an entertaining hallucination.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">To be clear, this isn\u2019t a hard and fast rule of artificial intelligence. It\u2019s a result of the central role reinforcement learning is playing in AI development, which could easily change as models develop. But as long as RL is the primary tool for bringing AI products to market, the reinforcement gap will only grow bigger \u2014 with serious implications for both startups and the economy at large. If a process ends up on the right side of the reinforcement gap, startups will probably succeed in automating it \u2014 and anyone doing that work now may end up looking for a new career. The question of which healthcare services are RL-trainable, for instance, has enormous implications for the shape of the economy over the next 20 years. And if surprises like Sora 2 are any indication, we may not have to wait long for an answer.<\/p>\n","protected":false},"excerpt":{"rendered":"AI coding tools are getting better fast. If you don\u2019t work in code, it can be hard to&hellip;\n","protected":false},"author":2,"featured_media":203507,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[182,3051,181,507,117129,117130,74],"class_list":{"0":"post-203506","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-ai-coding","10":"tag-artificial-intelligence","11":"tag-artificialintelligence","12":"tag-reinforcement-learning","13":"tag-sora-2","14":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/203506","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=203506"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/203506\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/203507"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=203506"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=203506"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=203506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}