{"id":149381,"date":"2025-09-17T01:59:09","date_gmt":"2025-09-17T01:59:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/149381\/"},"modified":"2025-09-17T01:59:09","modified_gmt":"2025-09-17T01:59:09","slug":"silicon-valley-bets-big-on-environments-to-train-ai-agents","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/149381\/","title":{"rendered":"Silicon Valley bets big on &#8216;environments&#8217; to train AI agents"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">For years, Big Tech CEOs have touted visions of <a href=\"https:\/\/techcrunch.com\/2025\/03\/14\/no-one-knows-what-the-hell-an-ai-agent-is\/\" rel=\"nofollow noopener\" target=\"_blank\">AI agents<\/a> that can autonomously use software applications to complete tasks for people. But take today\u2019s consumer AI agents out for a spin, whether it\u2019s OpenAI\u2019s <a href=\"https:\/\/techcrunch.com\/2025\/07\/17\/openai-launches-a-general-purpose-agent-in-chatgpt\/\" rel=\"nofollow noopener\" target=\"_blank\">ChatGPT Agent<\/a> or Perplexity\u2019s <a href=\"https:\/\/techcrunch.com\/2025\/07\/09\/perplexity-launches-comet-an-ai-powered-web-browser\/\" rel=\"nofollow noopener\" target=\"_blank\">Comet<\/a>, and you\u2019ll quickly realize how limited the technology still is. Making AI agents more robust may take a new set of techniques that the industry is still discovering.<\/p>\n<p class=\"wp-block-paragraph\">One of those techniques is carefully simulating workspaces where agents can be trained on multi-step tasks \u2014 known as reinforcement learning (RL) environments. Similarly to how labeled datasets powered the last wave of AI, RL environments are starting to look like a critical element in the development of agents.<\/p>\n<p class=\"wp-block-paragraph\">AI researchers, founders, and investors tell TechCrunch that leading AI labs are now demanding more RL environments, and there\u2019s no shortage of startups hoping to supply them.<\/p>\n<p class=\"wp-block-paragraph\">\u201cAll the big AI labs are building RL environments in-house,\u201d said Jennifer Li, general partner at Andreessen Horowitz, in an interview with TechCrunch. \u201cBut as you can imagine, creating these datasets is very complex, so AI labs are also looking at third party vendors that can create high quality environments and evaluations. Everyone is looking at this space.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The push for RL environments has minted a new class of well-funded startups, such as Mechanize and Prime Intellect, that aim to lead the space. Meanwhile, large data-labeling companies like Mercor and Surge say they\u2019re investing more in RL environments to keep pace with the industry\u2019s shifts from static datasets to interactive simulations. The major labs are considering investing heavily too: according to The Information, leaders at Anthropic have discussed spending more than <a rel=\"nofollow noopener\" href=\"https:\/\/www.theinformation.com\/articles\/anthropic-openai-developing-ai-co-workers?rc=dp0mql\" target=\"_blank\">$1 billion on RL environments<\/a> over the next year.<\/p>\n<p class=\"wp-block-paragraph\">The hope for investors and founders is that one of these startups emerge as the \u201cScale AI for environments,\u201d referring to the <a href=\"https:\/\/techcrunch.com\/2025\/07\/09\/perplexity-launches-comet-an-ai-powered-web-browser\/\" rel=\"nofollow noopener\" target=\"_blank\">$29 billion data labelling powerhouse<\/a> that powered the chatbot era.<\/p>\n<p class=\"wp-block-paragraph\">The question is whether RL environments will truly push the frontier of AI progress.<\/p>\n<p>Techcrunch event<\/p>\n<p>\n\t\t\t\t\t\t\t\t\tSan Francisco<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t|<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\tOctober 27-29, 2025\n\t\t\t\t\t\t\t<\/p>\n<p>What is an RL environment?<\/p>\n<p class=\"wp-block-paragraph\">At their core, RL environments are training grounds that simulate what an AI agent would be doing in a real software application. One founder described building them in <a rel=\"nofollow noopener\" href=\"https:\/\/www.nytimes.com\/2025\/06\/11\/technology\/ai-mechanize-jobs.html\" target=\"_blank\">recent interview<\/a> \u201clike creating a very boring video game.\u201d<\/p>\n<p class=\"wp-block-paragraph\">For example, an environment could simulate a Chrome browser and task an AI agent with purchasing a pair of socks on Amazon. The agent is graded on its performance and sent a reward signal when it succeeds (in this case, buying a worthy pair of socks).<\/p>\n<p class=\"wp-block-paragraph\">While such a task sounds relatively simple, there are a lot of places where an AI agent could get tripped up. It might get lost navigating the web page\u2019s drop down menus, or buy too many socks. And because developers can\u2019t predict exactly what wrong turn an agent will take, the environment itself has to be robust enough to capture any unexpected behavior, and still deliver useful feedback. That makes building environments far more complex than a static dataset.<\/p>\n<p class=\"wp-block-paragraph\">Some environments are quite elaborate, allowing for AI agents to use tools, access the internet, or use various software applications to complete a given task. Others are more narrow, aimed at helping an agent learn specific tasks in enterprise software applications.<\/p>\n<p class=\"wp-block-paragraph\">While RL environments are the hot thing in Silicon Valley right now, there\u2019s a lot of precedent for using this technique. One of OpenAI\u2019s first projects back in 2016 was building \u201c<a rel=\"nofollow noopener\" href=\"https:\/\/openai.com\/index\/openai-gym-beta\/\" target=\"_blank\">RL Gyms<\/a>,\u201d which were quite similar to the modern conception of environments. The same year, Google DeepMind\u2019s <a rel=\"nofollow noopener\" href=\"https:\/\/www.theguardian.com\/technology\/2016\/mar\/09\/google-deepmind-alphago-ai-defeats-human-lee-sedol-first-game-go-contest\" target=\"_blank\">AlphaGo<\/a> AI system beat a world champion at the board game, Go. It also used RL techniques within a simulated environment.<\/p>\n<p class=\"wp-block-paragraph\">What\u2019s unique about today\u2019s environments is that researchers are trying to build computer-using AI agents with large transformer models. Unlike AlphaGo, which was a specialized AI system working in a closed environments, today\u2019s AI agents are trained to have more general capabilities. AI researchers today have a stronger starting point, but also a complicated goal where more can go wrong. <\/p>\n<p>A crowded field<\/p>\n<p class=\"wp-block-paragraph\">AI data labeling companies like Scale AI, Surge, and Mercor are trying to meet the moment and build out RL environments. These companies have more resources than many startups in the space, as well as deep relationships with AI labs. <\/p>\n<p class=\"wp-block-paragraph\">Surge CEO Edwin Chen tells TechCrunch he\u2019s recently seen a \u201csignificant increase\u201d in demand for RL environments within AI labs. Surge \u2014 which reportedly generated <a rel=\"nofollow noopener\" href=\"https:\/\/www.bloomberg.com\/news\/articles\/2025-07-30\/scale-rival-surge-ai-in-talks-for-funding-at-25-billion-value\" target=\"_blank\">$1.2 billion in revenue<\/a> last year from working with AI labs like OpenAI, Google, Anthropic and Meta \u2014 recently spun up a new internal organization specifically tasked with building out RL environments, he said.<\/p>\n<p class=\"wp-block-paragraph\">Close behind Surge is Mercor, a startup valued at $10 billion, which has also worked with OpenAI, Meta, and Anthropic. Mercor is pitching investors on its business <a href=\"https:\/\/techcrunch.com\/2025\/09\/09\/sources-ai-training-startup-mercor-eyes-10b-valuation-on-450m-run-rate\/\" rel=\"nofollow noopener\" target=\"_blank\">building RL environments<\/a> for domain specific tasks such as coding, healthcare, and law, according to marketing materials seen by TechCrunch.<\/p>\n<p class=\"wp-block-paragraph\">Mercor CEO Brendan Foody told TechCrunch in an interview that \u201cfew understand how large the opportunity around RL environments truly is.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Scale AI used to dominate the data labeling space, but has lost ground since Meta<a href=\"https:\/\/techcrunch.com\/2025\/06\/13\/scale-ai-confirms-significant-investment-from-meta-says-ceo-alexandr-wang-is-leaving\/\" rel=\"nofollow noopener\" target=\"_blank\"> invested $14 billion<\/a> and hired away its CEO. Since then, Google and OpenAI <a href=\"https:\/\/techcrunch.com\/2025\/06\/18\/openai-drops-scale-ai-as-a-data-provider-following-meta-deal\/\" rel=\"nofollow noopener\" target=\"_blank\">dropped<\/a> Scale AI as a data provider, and the startup even faces competition for data labelling work <a href=\"https:\/\/techcrunch.com\/2025\/08\/29\/cracks-are-forming-in-metas-partnership-with-scale-ai\/\" rel=\"nofollow noopener\" target=\"_blank\">inside of Meta<\/a>. But still, Scale is trying to meet the moment and build environments.<\/p>\n<p class=\"wp-block-paragraph\">\u201cThis is just the nature of the business [Scale AI] is in,\u201d said Chetan Rane, Scale AI\u2019s head of product for agents and RL environments. \u201cScale has proven its ability to adapt quickly. We did this in the early days of autonomous vehicles, our first business unit. When ChatGPT came out, Scale AI adapted to that. And now, once again, we\u2019re adapting to new frontier spaces like agents and environments.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Some newer players are focusing exclusively on environments from the outset. Among them is Mechanize, a startup founded roughly six months ago with the audacious goal of \u201cautomating all jobs.\u201d However, co-founder Matthew Barnett tells TechCrunch that his firm is starting with RL environments for AI coding agents.<\/p>\n<p class=\"wp-block-paragraph\">Mechanize aims to supply AI labs with a small number of robust RL environments, Barnett says, rather than larger data firms that create a wide range of simple RL environments. To this point, the startup is offering software engineers <a rel=\"nofollow noopener\" href=\"https:\/\/jobs.ashbyhq.com\/mechanize\/4e401df6-49cc-4db3-a840-0ee2f68c019b\" target=\"_blank\">$500,000 salaries<\/a> to build RL environments \u2014 far higher than an hourly contractor could earn working at Scale AI or Surge.<\/p>\n<p class=\"wp-block-paragraph\">Mechanize has already been working with Anthropic on RL environments, two sources familiar with the matter told TechCrunch. Mechanize and Anthropic declined to comment on the partnership.<\/p>\n<p class=\"wp-block-paragraph\">Other startups are betting that RL environments will be influential outside of AI labs. Prime Intellect \u2014 a startup backed by AI researcher Andrej Karpathy, Founders Fund, and Menlo Ventures \u2014 is targeting smaller developers with its RL environments.<\/p>\n<p class=\"wp-block-paragraph\">Last month, Prime Intellect launched an <a rel=\"nofollow noopener\" href=\"https:\/\/www.primeintellect.ai\/blog\/environments\" target=\"_blank\">RL environments hub,<\/a> which aims to be a \u201cHugging Face for RL environments.\u201d The idea is to give open-source developers access to the same resources that large AI labs have, and sell those developers access to computational resources in the process.<\/p>\n<p class=\"wp-block-paragraph\">Training generally capable agents in RL environments can be more computational expensive than previous AI training techniques, according to Prime Intellect researcher Will Brown. Alongside startups building RL environments, there\u2019s another opportunity for GPU providers that can power the process.<\/p>\n<p class=\"wp-block-paragraph\">\u201cRL environments are going to be too large for any one company to dominate,\u201d said Brown in an interview. \u201cPart of what we\u2019re doing is just trying to build good open-source infrastructure around it. The service we sell is compute, so it is a convenient onramp to using GPUs, but we\u2019re thinking of this more in the long term.\u201d<\/p>\n<p>Will it scale?<\/p>\n<p class=\"wp-block-paragraph\">The open question around RL environments is whether the technique will scale like previous AI training methods.<\/p>\n<p class=\"wp-block-paragraph\">Reinforcement learning has powered some of the biggest leaps in AI over the past year, including models like <a href=\"https:\/\/techcrunch.com\/2024\/09\/12\/openai-unveils-a-model-that-can-fact-check-itself\/\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI\u2019s o1<\/a> and Anthropic\u2019s <a href=\"https:\/\/techcrunch.com\/2025\/05\/22\/anthropics-new-claude-4-ai-models-can-reason-over-many-steps\/\" rel=\"nofollow noopener\" target=\"_blank\">Claude Opus 4<\/a>. Those are particularly important breakthroughs because the methods previously used to improve AI models are now <a href=\"https:\/\/techcrunch.com\/2024\/11\/20\/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course\/\" rel=\"nofollow noopener\" target=\"_blank\">showing diminishing returns<\/a>.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Environments are part of AI labs\u2019 bigger bet on RL, which many believe will continue to drive progress as they add more data and computational resources to the process. Some of the OpenAI researchers behind o1 previously told TechCrunch that the company originally invested in AI reasoning models \u2014 which were created through investments in RL and test-time-compute \u2014 because <a href=\"https:\/\/techcrunch.com\/2025\/08\/03\/inside-openais-quest-to-make-ai-do-anything-for-you\/\" rel=\"nofollow noopener\" target=\"_blank\">they thought it would scale<\/a> nicely.<\/p>\n<p class=\"wp-block-paragraph\">The best way to scale RL remains unclear, but environments seem like a promising contender. Instead of simply rewarding chatbots for text responses, they let agents operate in simulations with tools and computers at their disposal. That\u2019s far more resource-intensive, but potentially more rewarding. <\/p>\n<p class=\"wp-block-paragraph\">Some are skeptical that all these RL environments will pan out. Ross Taylor, a former AI research lead with Meta that co-founded General Reasoning, tells TechCrunch that RL environments are prone to reward hacking. This is a process in which AI models cheat in order to get a reward, without really doing the task.<\/p>\n<p class=\"wp-block-paragraph\">\u201cI think people are underestimating how difficult it is to scale environments,\u201d said Taylor. \u201cEven the best publicly available [RL environments] typically don\u2019t work without serious modification.\u201d<\/p>\n<p class=\"wp-block-paragraph\">OpenAI\u2019s Head of Engineering for its API business, Sherwin Wu, said in a <a rel=\"nofollow\" href=\"https:\/\/x.com\/swyx\/status\/1966269298974011594\">recent podcast<\/a> that he was \u201cshort\u201d on RL environment startups. Wu noted that it\u2019s a very competitive space, but also that AI research is evolving so quickly that it\u2019s hard to serve AI labs well.<\/p>\n<p class=\"wp-block-paragraph\">Karpathy, an investor in Prime Intellect that has called RL environments a potential breakthrough, has also voiced caution for the RL space more broadly. In a <a rel=\"nofollow\" href=\"https:\/\/x.com\/karpathy\/status\/1960803117689397543\">post on X<\/a>, he raised concerns about how much more AI progress can be squeezed out of RL.<\/p>\n<p class=\"wp-block-paragraph\">\u201cI am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically,\u201d said Karpathy. <\/p>\n<p class=\"wp-block-paragraph\">Update: A previous version of this article referred to Mechanize as Mechanize Work. It has been updated to reflect the company\u2019s official name.<\/p>\n","protected":false},"excerpt":{"rendered":"For years, Big Tech CEOs have touted visions of AI agents that can autonomously use software applications to&hellip;\n","protected":false},"author":2,"featured_media":53539,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[27196,62,10415,1930,276,277,49,48,278,79431,79432,22318,61],"class_list":{"0":"post-149381","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-agents","9":"tag-ai","10":"tag-ai-research","11":"tag-anthropic","12":"tag-artificial-intelligence","13":"tag-artificialintelligence","14":"tag-ca","15":"tag-canada","16":"tag-openai","17":"tag-reinforcement-learning","18":"tag-rl","19":"tag-scale-ai","20":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/149381","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=149381"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/149381\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/53539"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=149381"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=149381"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=149381"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}