{"id":530981,"date":"2026-03-10T12:05:08","date_gmt":"2026-03-10T12:05:08","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/530981\/"},"modified":"2026-03-10T12:05:08","modified_gmt":"2026-03-10T12:05:08","slug":"andrej-karpathys-new-open-source-autoresearch-lets-you-run-hundreds-of-ai-experiments-a-night-with-revolutionary-implications","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/530981\/","title":{"rendered":"Andrej Karpathy&#8217;s new open source &#8216;autoresearch&#8217; lets you run hundreds of AI experiments a night \u2014 with revolutionary implications"},"content":{"rendered":"<p>Over the weekend, Andrej Karpathy\u2014the influential former Tesla AI lead and co-founder and former member of OpenAI who coined the term &#8220;vibe coding&#8221;\u2014 <a href=\"https:\/\/x.com\/karpathy\/status\/2030371219518931079\" rel=\"nofollow\">posted on X<\/a> about his new open source project, <a href=\"https:\/\/github.com\/karpathy\/autoresearch\/discussions\/43\" rel=\"nofollow noopener\" target=\"_blank\">autoresearch<\/a>. <\/p>\n<p>It wasn&#8217;t a finished model or a massive corporate product: it was by his own admission a simple, 630-line script <a href=\"https:\/\/github.com\/karpathy\/autoresearch\/blob\/master\/README.md\" rel=\"nofollow noopener\" target=\"_blank\">made available on Github<\/a> under a permissive, enterprise-friendly MIT License. But the ambition was massive: automating the scientific method with AI agents while us humans sleep. <\/p>\n<p>&#8220;The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement,&#8221; he stated on X.<\/p>\n<p>The system functions as an autonomous optimization loop. An AI agent is given a training script and a fixed compute budget (typically 5 minutes on a GPU).<\/p>\n<p>It reads its own source code, forms a hypothesis for improvement (such as changing a learning rate or an architecture depth), modifies the code, runs the experiment, and evaluates the results. <\/p>\n<p>If the validation loss\u2014measured in bits per byte (val_bpb)\u2014improves, it keeps the change; if not, it reverts and tries again. In one overnight run, Karpathy\u2019s agent completed 126 experiments, driving loss down from 0.9979 to 0.9697.<\/p>\n<p>Today, Karpathy reported that after leaving the agent to tune a &#8220;depth=12&#8221; model for two days, it successfully <a href=\"https:\/\/x.com\/karpathy\/status\/2031135152349524125\" rel=\"nofollow\">processed approximately 700 autonomous changes.<\/a><\/p>\n<p>The agent found roughly 20 additive improvements that transferred perfectly to larger models. Stacking these changes dropped the &#8220;Time to GPT-2&#8221; metric on the leaderboard from 2.02 hours to 1.80 hours\u2014an 11% efficiency gain on a project Karpathy believed was already well-tuned. <\/p>\n<p>&#8220;Seeing the agent do this entire workflow end-to-end and all by itself&#8230; is wild,&#8221; Karpathy remarked, noting that the agent caught oversights in attention scaling and regularization that he had missed manually over two decades of work.<\/p>\n<p>This is more than just a productivity hack; it is a fundamental shift in how intelligence is refined. By automating the &#8220;scientific method&#8221; for code, Karpathy has turned machine learning into an evolutionary process that runs at the speed of silicon rather than the speed of human thought. <\/p>\n<p>And more than this, it showed the broader AI and machine learning community on X that this type of process could be applied far beyond computer science, to fields like marketing, health, and, well, basically anything that requires research.<\/p>\n<p>Autoresearch spreads far and wide<\/p>\n<p>The reaction was swift and viral, with Karpathy&#8217;s post garnering more than 8.6 million views in the intervening two days as builders and researchers scrambled to scale the &#8220;Karpathy loop&#8221;.<\/p>\n<p><a href=\"https:\/\/x.com\/varun_mathur\/status\/2031004607426498574\" rel=\"nofollow\">Varun Mathur, CEO of AI tool aggregator platform Hyperspace AI,<\/a> took the single-agent loop and distributed it across a peer-to-peer network. Every node running the Hyperspace agent became an autonomous researcher.<\/p>\n<p>On the night of March 8\u20139, 35 autonomous agents on the Hyperspace network ran 333 experiments completely unsupervised. The results were a masterclass in emergent strategy:<\/p>\n<p>Hardware Diversity as a Feature: Mathur noted that while H100 GPUs used &#8220;brute force&#8221; to find aggressive learning rates, CPU-only agents on laptops were forced to be clever. These &#8220;underdog&#8221; agents focused on initialization strategies (like Kaiming and Xavier init) and normalization choices because they couldn&#8217;t rely on raw throughput.<\/p>\n<p>Gossip-Based Discovery: Using the GossipSub protocol, agents shared their wins in real-time. When one agent found that Kaiming initialization dropped loss by 21%, the idea spread through the network like a digital virus. Within hours, 23 other agents had incorporated the discovery into their own hypotheses.<\/p>\n<p>The Compression of History: In just 17 hours, these agents independently rediscovered ML milestones\u2014such as RMSNorm and tied embeddings\u2014that took human researchers at labs like Google Brain and OpenAI nearly eight years to formalize.<\/p>\n<p>Run 36,500 marketing experiments each year instead of 30<\/p>\n<p>While the ML purists focused on loss curves, the business world saw a different kind of revolution. <a href=\"https:\/\/x.com\/ericosiu\/status\/2030758253395951958?s=46\" rel=\"nofollow\">Eric Siu, founder of ad agency Single Grain<\/a>, applied autoresearch to the &#8220;Experiment Loop&#8221; of marketing.<\/p>\n<p>&#8220;Most marketing teams run ~30 experiments a year,&#8221; Siu wrote on X. &#8220;The next generation will run 36,500+. Easily.&#8221; He continued:<\/p>\n<p>&#8220;They&#8217;ll run experiments while they sleep.<br \/>\nCurrent marketing teams run 20-30 experiments a year. Maybe 52 if they&#8217;re &#8216;good&#8217;.<br \/>\nNew landing page.<br \/>\nNew ad creative.<br \/>\nMaybe a subject line test.<br \/>\nThat&#8217;s considered &#8220;data-driven marketing.&#8221;<br \/>\nBut the next generation of marketing systems will run 36,500+ experiments per year.&#8221;<\/p>\n<p>Siu\u2019s framework replaces the training script with a marketing asset\u2014a landing page, an ad creative, or a cold email. The agent modifies a variable (the subject line or the CTA), deploys it, measures the &#8220;positive reply rate,&#8221; and keeps or discards.<\/p>\n<p>Siu argues that this creates a &#8220;proprietary map&#8221; of what resonates with a specific audience\u2014a moat built not of code, but of experiment history. &#8220;The companies that win won&#8217;t have better marketers,&#8221; he wrote, &#8220;they&#8217;ll have faster experiment loops&#8221;.<\/p>\n<p>Community discussion and &#8216;spoiling&#8217; the validation set<\/p>\n<p>Despite the fervor, the <a href=\"https:\/\/github.com\/karpathy\/autoresearch\/discussions\" rel=\"nofollow noopener\" target=\"_blank\">GitHub Discussions <\/a>revealed a community grappling with the implications of such rapid, automated progress.<\/p>\n<p>The Over-Optimization Trap: Researcher <a href=\"https:\/\/github.com\/karpathy\/autoresearch\/discussions\/43#discussioncomment-16043200\" rel=\"nofollow noopener\" target=\"_blank\">alexisthual<\/a> raised a poignant concern: &#8220;Aren&#8217;t you concerned that launching that many experiments will eventually &#8216;spoil&#8217; the validation set?&#8221;. The fear is that with enough agents, parameters will be optimized for the specific quirks of the test data rather than general intelligence.<\/p>\n<p>The Meaning of the Gains: User <a href=\"https:\/\/github.com\/karpathy\/autoresearch\/discussions\/43#discussioncomment-16043514\" rel=\"nofollow noopener\" target=\"_blank\">samionb<\/a> questioned whether a drop from 0.9979 to 0.9697 was truly noticeable. Karpathy\u2019s response was characteristically direct: &#8220;All we&#8217;re doing is optimizing performance per compute&#8230; these are real and substantial gains&#8221;<\/p>\n<p>The Human Element: On X, user <a href=\"https:\/\/x.com\/witcheer\/status\/2030900817700565394?s=46\" rel=\"nofollow\">witcheer<\/a>, Head of Growth at crypto platform<a href=\"https:\/\/yari.fi\/\" rel=\"nofollow noopener\" target=\"_blank\"> Yari Finance<\/a>, documented their own overnight run on a Mac Mini M4, noting that while 26 of 35 experiments failed or crashed, the seven that succeeded revealed that &#8220;the model got better by getting simpler&#8221;. <\/p>\n<p>This insight\u2014that less is often more\u2014was reached without a single human intervention.<\/p>\n<p>The future: curiosity as the bottleneck<\/p>\n<p>The release of autoresearch suggests a future of research across domains where, thanks to simple AI instruction mechanisms, the role of the human shifts from &#8220;experimenter&#8221; to &#8220;experimental designer.&#8221;<\/p>\n<p>As tools like DarkMatter, Optimization Arena, and NanoClaw emerge to support this swarm, the bottleneck of AI progress is no longer the &#8220;meat computer&#8217;s&#8221; (Karpathy&#8217;s description of the human brain&#8217;s) ability to code\u2014it is our ability to define the constraints of the search.<\/p>\n<p>Andrej Karpathy has once again shifted the vibe. We are no longer just coding models; we are seeding ecosystems that learn while we sleep.<\/p>\n","protected":false},"excerpt":{"rendered":"Over the weekend, Andrej Karpathy\u2014the influential former Tesla AI lead and co-founder and former member of OpenAI who&hellip;\n","protected":false},"author":2,"featured_media":530982,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-530981","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/530981","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=530981"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/530981\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/530982"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=530981"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=530981"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=530981"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}