{"id":255011,"date":"2025-11-01T07:58:11","date_gmt":"2025-11-01T07:58:11","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/255011\/"},"modified":"2025-11-01T07:58:11","modified_gmt":"2025-11-01T07:58:11","slug":"a-new-paper-tested-ais-ability-to-do-actual-online-freelance-work-and-the-results-are-damning","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/255011\/","title":{"rendered":"A New Paper Tested AI&#8217;s Ability to Do Actual Online Freelance Work, and the Results Are Damning"},"content":{"rendered":"<p class=\"pw-incontent-excluded article-paragraph skip\">Your pesky remote freelancers demanding more money as inflation soars? You could try replacing them with AI agents instead \u2014 but it probably won\u2019t work out well.<\/p>\n<p class=\"article-paragraph skip\">New research <a href=\"https:\/\/www.wired.com\/story\/ai-agents-are-terrible-freelance-workers\/\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">highlighted by Wired<\/a> shows how these AI models designed to automate tasks \u2014 if not entire jobs \u2014 turn out to be incredibly unproductive compared to the humans they\u2019re replacing.<\/p>\n<p class=\"article-paragraph skip\">Conducted by researchers at the nonprofit Center for AI Safety (CAIS) and the massive data annotation firm Scale AI, whose army of freelancers performs <a href=\"https:\/\/futurism.com\/artificial-intelligence\/ai-industry-traumatizing-contractors\" rel=\"nofollow noopener\" target=\"_blank\">much of the grunt work<\/a> underpinning the AI industry, the tests involved giving six leading AI agents various simulated freelance tasks.<\/p>\n<p class=\"article-paragraph skip\">The outcome of those tests, detailed in a <a href=\"https:\/\/www.remotelabor.ai\/paper.pdf\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">new paper<\/a>, was damning. Not a single AI agent was able to perform more than 3 percent of the work, making just $1,810 out of a possible $143,991.<\/p>\n<p class=\"article-paragraph skip\">\u201cI should hope this gives much more accurate impressions as to what\u2019s going on with AI capabilities,\u201d DAIS director Dan Hendrycks told Wired.\u00a0<\/p>\n<p class=\"article-paragraph skip\">For the tests, the researchers developed their own benchmark called the <a href=\"https:\/\/www.remotelabor.ai\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">Remote Labor Index<\/a>, which uses a wide range of real-world remote projects to evaluate the bots\u2019 ability to perform economically valuable work in industries ranging from game development to data analysis.<\/p>\n<p class=\"article-paragraph skip\">The top performer, they found, was an AI agent from the Chinese startup Manus with an automation rate of just 2.5 percent, meaning it was only able to complete 2.5 percent of the projects it was assigned at a level that would be acceptable as commissioned work in a real-world freelancing job, the researchers said.<\/p>\n<p class=\"article-paragraph skip\">Second place was a tie, at 2.1 percent, between Elon Musk\u2019s Grok 4 and Anthropic\u2019s Claude Sonnet 4.5, which the company claims is the \u201cbest coding model in the world\u201d and the \u201cstrongest model for building complex agents.\u201d<\/p>\n<p class=\"article-paragraph skip\">OpenAI\u2019s newest GPT-5 model and its purported \u201cPhD level\u201d intelligence came next at 1.7 percent. CEO Sam Altman <a href=\"https:\/\/www.technologyreview.com\/2025\/08\/07\/1121308\/gpt-5-is-here-now-what\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">has claimed<\/a> that GPT-5 is a \u201csignificant step along the path to AGI,\u201d or artificial general intelligence, a hypothetical AI system that most define as exceeding human cognitive abilities in virtually all aspects. (OpenAI <a href=\"https:\/\/openai.com\/index\/built-to-benefit-everyone\/\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">considers AGI<\/a> to be \u201chighly autonomous systems that outperform humans at most economically valuable work,\u201d something that the RLI benchmark shows GPT-5 is nowhere close to doing.)<\/p>\n<p class=\"article-paragraph skip\">Ironically, OpenAI\u2019s actual AI agent, with the exciting brand name of ChatGPT Agent, was the second worst performer of the whole bunch, barely cracking 1.3 percent. But the absolute bottom of the barrel choice turned out to be Google\u2019s Gemini 2.5 Pro, with a dismal 0.8 percent showing.<\/p>\n<p class=\"article-paragraph skip\">Selling AI agents to employers has been the obsession of the AI industry as leading players like OpenAI struggle to capitalize on the popularity of their AI chatbots, many of which are free to use. But despite many CEOs eagerly <a href=\"https:\/\/futurism.com\/artificial-intelligence\/meta-memo-automation\" rel=\"nofollow noopener\" target=\"_blank\">culling their workforces and embracing AI<\/a>, it remains to be seen if automation is able to actually increase productivity, let alone make up for the shortfall of human talent it\u2019s replacing.<\/p>\n<p class=\"article-paragraph skip\">\u201cWe have debated AI and jobs for years, but most of it has been hypothetical or theoretical,\u201d director of research at Scale AI Bing Lie told Wired.<\/p>\n<p class=\"article-paragraph skip\">Anecdotally, many bosses who replaced their employees with AI have been <a href=\"https:\/\/futurism.com\/klarna-ai-automation-engineers\" rel=\"nofollow noopener\" target=\"_blank\">forced to rehire them<\/a> after discovering the AI tools weren\u2019t up to snuff, and a slew of research is painting a similarly damning picture. One <a href=\"https:\/\/www.axios.com\/2025\/08\/21\/ai-wall-street-big-tech\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">MIT study<\/a> found that 95 percent of companies that piloted AI initiatives saw no meaningful growth in revenue. Another demonstrated that introducing AI tools into employee workflows resulted in a deluge of low quality \u201c<a href=\"https:\/\/futurism.com\/future-society\/ai-productivity-research\" rel=\"nofollow noopener\" target=\"_blank\">workslop<\/a>,\u201d\u00a0which not only bogged everything down because of its need to be heavily revised for errors, but created tension between coworkers who resented being forced to correct such lazy work.<\/p>\n<p class=\"article-paragraph skip\">Hendrycks pointed out some of the flaws still plaguing AI agents, despite the field\u2019s rapid advances. \u201cThey don\u2019t have long-term memory storage and can\u2019t do continual learning from experiences. They can\u2019t pick up skills on the job like humans,\u201d he told Wired.<\/p>\n<p class=\"article-paragraph skip\">So far, though, these glaring flaws haven\u2019t seemed to slow down the freight train of AI related firings. If anything, they\u2019re still picking up steam.<\/p>\n<p class=\"article-paragraph skip\">More on AI: <a href=\"https:\/\/futurism.com\/future-society\/internet-aws-amazon-layoffs\" rel=\"nofollow noopener\" target=\"_blank\">After Bringing Down Internet, Amazon Announces Biggest Mass Firing in Its History<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Your pesky remote freelancers demanding more money as inflation soars? You could try replacing them with AI agents&hellip;\n","protected":false},"author":2,"featured_media":255012,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-255011","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/255011","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=255011"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/255011\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/255012"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=255011"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=255011"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=255011"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}