{"id":166448,"date":"2025-11-29T19:12:09","date_gmt":"2025-11-29T19:12:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/166448\/"},"modified":"2025-11-29T19:12:09","modified_gmt":"2025-11-29T19:12:09","slug":"three-years-on-chatgpt-still-isnt-what-it-was-cracked-up-to-be-and-it-probably-never-will-be","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/166448\/","title":{"rendered":"Three years on, ChatGPT still isn&#8217;t what it was cracked up to be \u2013 and it probably never will be"},"content":{"rendered":"<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!b0EH!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88c3e10c-85dd-4d64-bf1a-ac04412edb6a_1536x1024.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/88c3e10c-85dd-4d64-bf1a-ac04412edb6a_1536.jpeg\" width=\"1456\" height=\"971\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/88c3e10c-85dd-4d64-bf1a-ac04412edb6a_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3698347,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/180115766?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88c3e10c-85dd-4d64-bf1a-ac04412edb6a_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   fetchpriority=\"high\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>Three years ago, on November 30, 2022, ChatGPT was released. It\u2019s been one of the fastest-growing consumer products in history, and gotten more press than God. But I think a fair case can be made that it is not what it has often been cracked up to be, and probably never will be. <\/p>\n<p>Before I dive in, let me make four of my core beliefs, often misrepresented, absolutely clear:<\/p>\n<p>I believe that artificial general intelligence (AGI) is achievable.<\/p>\n<p>I believe that there is at least a chance that artificial general intelligence will be of large net benefit to society. <\/p>\n<p>I just don\u2019t happen to think large language models like ChatGPT will get us there. (I do think they have their uses, but I worry about their costs to society, around bias, cybersecurity, misinformation, nonconsensual deepfake porn, copyright theft, energy and water usage, the gradual enshittification of the internet, the severe hit to college education, and so on.)<\/p>\n<p>I think that the recurring core technical problems that we have seen (as discussed below) with LLMs aren\u2019t going way; instead they inherent to the technology.<\/p>\n<p>In short, I am at least modestly bullish on AGI, but don\u2019t think that large language models like ChatGPT are the droids we are looking for. And I certainly don\u2019t think that ChatGPT has lived up to expectations. Increasingly, it appears that others are recognizing this as well.<\/p>\n<p>Let\u2019s review.<\/p>\n<p>For one thing, a lot of people thought that ChatGPT (or some scaled-up-with-more-data-and-GPUs version of the tech underlying it) would lead imminently to an artificial general intelligence that was capable of doing anything a human can do. There was mass panic about employment. Some people genuinely feared that GPT-5 would kill all humans on the planet. There was so much hype that some people seriously thought (until very recently!) that LLMs would bring us to AGI by 2025; the internet was once awash with memes like \u201c<a href=\"https:\/\/firstmovers.ai\/agi-2025\/\" rel=\"nofollow noopener\" target=\"_blank\">Altman confirms AGI in 2025<\/a>\u201d.<\/p>\n<p>Elon Musk himself, both a victim and a perpetrator of the hype, once said that his guess was that by the end of 2025 \u201cwe\u2019ll have AI that is smarter than any one human\u201d,  a proposition that was <a href=\"https:\/\/duckduckgo.com\/?q=we%E2%80%99ll+have+AI+that+is+smarter+than+any+one+human+probably+around+the+end+of+next+year&amp;t=ipad&amp;ia=web\" rel=\"nofollow noopener\" target=\"_blank\">solemnly reported on by many in the media<\/a>, from the Financial Times to Fortune to Business Insider. As recently as early August, Altman was claiming, absurdly, that ChatGPT-5 could do anything a PhD student could do. We now know otherwise.<\/p>\n<p>People also used to have fantasies of how LLMs were going to \u201c<a href=\"https:\/\/levelup.gitconnected.com\/the-chatgpt-prompt-framework-that-makes-you-10x-more-productive-ac2e94bc80a7\" rel=\"nofollow noopener\" target=\"_blank\">10x\u201d productivity<\/a>, turning every individual worker into ten. Altman also once spoke earnestly about how <a href=\"https:\/\/fortune.com\/2024\/02\/04\/sam-altman-one-person-unicorn-silicon-valley-founder-myth\/\" rel=\"nofollow noopener\" target=\"_blank\">we would soon see a billion dollar company run by just a single employee<\/a>.<\/p>\n<p>None of this has borne out. AGI (<a href=\"https:\/\/garymarcus.substack.com\/p\/is-agi-the-right-goal-for-ai\" rel=\"nofollow noopener\" target=\"_blank\">AI with the breadth and resourcesfulness of human intelligence<\/a>) is not coming soon. As noted here a few days ago, <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/breaking-the-ai-2027-doomsday-scenario?r=8tdk6&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false\" rel=\"nofollow noopener\" target=\"_blank\">even some of the biggest proponents have walked that back<\/a>, and certainly not at the end of this year; productivity studies sometimes show a 30% gain, but none have come close to 10x (1000% gain).  One <a href=\"https:\/\/arxiv.org\/abs\/2507.09089\" rel=\"nofollow noopener\" target=\"_blank\">prominent study even showed <\/a><a href=\"https:\/\/arxiv.org\/abs\/2507.09089\" rel=\"nofollow noopener\" target=\"_blank\">negative<\/a><a href=\"https:\/\/arxiv.org\/abs\/2507.09089\" rel=\"nofollow noopener\" target=\"_blank\"> effects among coders<\/a>, which is particularly striking since coding is generally considered to be one of the best test cases. (The coders themselves had expected positive impact, a clear illustration of how the advantages of LLMs can often be more perceived than real.) <\/p>\n<p>Return on investment for corporate end users is also lower than many people must have imagined. Everybody seems already to have heard about the <a href=\"https:\/\/fortune.com\/2025\/08\/18\/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo\/\" rel=\"nofollow noopener\" target=\"_blank\">MIT study that showed only 5% of companies were getting a return on generative AI investment<\/a>; many on social media were desperate to rebut it. But time has shown that its results have held strong. McKinsey, for example, just ran a study of their own, and the results weren\u2019t much different, per a report in The Wall Street Journal:<\/p>\n<p>McKinsey found that two-thirds of companies are just at the piloting stage. And only about one in 20 companies are what the consulting firm calls \u201chigh performers\u201d that have deeply integrated AI and see it driving more than 5% of their earnings.<\/p>\n<p>An Economist article,  \u201c<a href=\"https:\/\/www.economist.com\/finance-and-economics\/2025\/11\/26\/investors-expect-ai-use-to-soar-thats-not-happening\" rel=\"nofollow noopener\" target=\"_blank\">Investors expect AI use to soar. That\u2019s not happening<\/a>\u201d earlier this week was brutal, noting that \u201cRecent surveys point to flatlining business adoption.\u201d They report some Census Bureau data that I mentioned once before, that suggest that adoption has fallen sharply at the larges businesses employing over 250 people) and also added this:<\/p>\n<p>Even unofficial surveys point to stagnating corporate adoption. Jon Hartley of Stanford University and colleagues found that in September 37% of Americans used generative AI at work, down from 46% in June. A tracker by Alex Bick of the Federal Reserve Bank of St Louis and colleagues revealed that, in August 2024, 12.1% of working-age adults used generative AI every day at work. A year later 12.6% did. Ramp, a fintech firm, finds that in early 2025 AI use soared at American firms to 40%, before levelling off. The growth in adoption really does seem to be slowing.<\/p>\n<p>In a summary sentence, they conclude \u201cThree years into the generative-AI wave, demand for the technology looks surprisingly flimsy.\u201d (Nobody should actually be surprised by that, but I will get to that in a minute.)<\/p>\n<p>And worse, the economy itself has become so wrapped up in generative AI and its promises, that the economy itself is, by many accounts in serious jeopardy. (Early in the week a prominent person at the White House, David Sacks, warned of a recession, if  generative AI were to go south, <a href=\"https:\/\/garymarcus.substack.com\/p\/has-the-bailout-of-generative-ai?r=8tdk6\" rel=\"nofollow noopener\" target=\"_blank\">in a tweet that many people read as laying the groundwork for a potentially costly bailout of generative AI<\/a>.)<\/p>\n<p>If the economy goes down, ChatGPT will be at the center of the mess.<\/p>\n<p>\u00a7<\/p>\n<p>Nobody should be surprised if things play out that way. <\/p>\n<p>The results are disappointing because the underlying tech is unreliable, And that\u2019s been obvious from the start. I said as much to Farhad Manjoo, in an interview with the New York Times in December 2022, telling him a couple weeks after ChatGPT was released that ChatGPT made for\u201cnifty\u201d demonstrations, but that it \u201cstill not reliable, still doesn\u2019t understand the physical world, still doesn\u2019t understand the psychological world and still hallucinates.\u201d<\/p>\n<p>Ever since then, thousands of people (literally) have tried to tell me that scaling would solve all these concerns \u2013 but it hasn\u2019t. Not even close.<\/p>\n<p>Want a time capsule? Here\u2019s seven concerns I expressed on Christmas Day, 2022, less than a month after ChatGPT first came out, in an essay called <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/what-to-expect-when-youre-expecting?r=8tdk6&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false\" rel=\"nofollow noopener\" target=\"_blank\">What to Expect When You are Expecting GPT-4<\/a>, <\/p>\n<p>Warning \u201cagainst the tremendous optimism for GPT-4 that I have heard from much of the AI community\u201d I made \u201cseven dark predictions\u201d:<\/p>\n<p>GPT-4 [though larger than GPT 3.5] will still, like its predecessors, be a bull in a china shop, reckless and hard to control. It will still make a significant number of shake-your-head stupid errors, in ways that are hard to fully predict. It will often do what you want, sometimes not\u2014and it will remain difficult to anticipate which in advance.<\/p>\n<p>Reasoning about the physical, psychological and mathematical world will still be unreliable\u2026. It will not be trustworthy and complete enough to give reliable medical advice, despite devouring a large fraction of the Internet.<\/p>\n<p>Fluent hallucinations will still be common, and easily induced, \u2026 escalating \u2026 the risk of large language models being used as a tool for creating plausible-sounding yet false misinformation. Guardrails (a la ChatGPT) may be in place, but the guardrails will teeter between being too weak (beaten by \u201cjailbreaks\u201d) and too strong (rejecting some perfectly reasonable requests). \u2026<\/p>\n<p>Its natural language output still won\u2019t be something that one can reliably hook up to downstream programs; it won\u2019t be something, for example, that you can simply and directly hook up to a database or virtual assistant, with predictable results. \u2026<\/p>\n<p>GPT-4 by itself won\u2019t be a general purpose artificial general intelligence capable of taking on arbitrary tasks. Without external aids it won\u2019t be able to beat Meta\u2019s Cicero in Diplomacy; it won\u2019t be able to drive a car reliably; it won\u2019t be able to reliably guide a robot like Optimus, to be anything like as versatile as Rosie the Robot. It will remain a turbocharged pastiche generator, and a fine tool for brainstorming, and for first drafts, but not trustworthy general intelligence.<\/p>\n<p>\u201cAlignment\u201d between what humans want and what machines do will continue to be a critical, unsolved problem. The system will still not be able to restrict its output to reliably following a shared set of human values around helpfulness, harmlessness, and truthfulness. Examples of concealed bias will be discovered within days or months. Some of its advice will be head-scratchingly bad.<\/p>\n<p>When AGI (artificial intelligence) comes, large language models like GPT-4 may be seen in hindsight as part of the eventual solution, but only as part of the solution. \u201cScaling\u201d alone\u2014building bigger and models until they absorb the entire internet \u2014 will prove useful, but only to a point. Trustworthy, general artificial intelligence, aligned with human values, will come, when it does, from systems that are more structured, with more built-in knowledge, and will incorporate at least some degree of explicit tools for reasoning and planning, as well as explicit knowledge, that are lacking in systems like GPT. \u2026<\/p>\n<p>Nobody can deny that GPT-5 is more impressive than the original ChatGPT (aka GPT 3.5), or that the latest models have more utility. LLMs have certainly come a long way. <\/p>\n<p>But we have to take seriously the persistence of the core limits of the systems as well as their strengths. <\/p>\n<p>Three years and nearly a trillion dollars later almost of all it remains true. And it\u2019s not just that GPT-4 was pretty much as I anticipated but that the same problems have continued to plague every single model since\u2014from GPT 4.0 to GPT 4.1 to GPT 4.5 and  GPT-5 (and many others) to all the variations on Claude, all the variations on Gemini, all the variations on Grok, all the variations on llama, all the variations on DeepSeek, and so on.<\/p>\n<p>To anyone who is intellectually honest, the  pattern is astonishingly clear. Hundreds of models, always the same failure modes.<\/p>\n<p>If GPT-5 had solved these problems, as many people imagined it would, it would in fact of enormous economic value. But it hasn\u2019t.<\/p>\n<p>So far as I know even the latest language models are still bulls in a china shop, powerful but hard to control; they still can\u2019t reason reliably; they still don\u2019t work reliably with external tools; they continue to hallucinate; they still can\u2019t match domain specific models, the continue to struggle with alignment. And LLMs themselves have already resorted to using lots of external tools like calculators and python interpreters; it is already clear that they are more of a partial solution than the full solution people imagine.<\/p>\n<p>For all the talk of scaling and the trillion dollars invested, the basic pattern hasn\u2019t changed. LLMs have undoubtedly gotten quantitively better, but qualitively the core problems remain. As I wrote all way the back <a href=\"https:\/\/www.newyorker.com\/news\/news-desk\/is-deep-learning-a-revolution-in-artificial-intelligence\" rel=\"nofollow noopener\" target=\"_blank\">in 2012,  in The New Yorker<\/a>, <a href=\"https:\/\/quoteinvestigator.com\/2024\/04\/20\/moon-tree\/\" rel=\"nofollow noopener\" target=\"_blank\">paraphrasing an older quote from the godfather of AI criticism, Hubert Dreyfus<\/a>, deep learning (the tech underling large language models) \u201cis a better ladder; but a better ladder doesn\u2019t necessarily get you to the moon.\u201d<\/p>\n<p>The only real news is that more people are realizing that all this true. (See my last essay <a href=\"https:\/\/garymarcus.substack.com\/p\/a-trillion-dollars-is-a-terrible?r=8tdk6\" rel=\"nofollow noopener\" target=\"_blank\">on OpenAI cofounder Ilya Sutskever<\/a> for one prominent example.)<\/p>\n<p>The truth is that ChatGPT has never grown up, in the sense of addressing the core challenges that I have set out. And on its own (without the aid of <a href=\"https:\/\/garymarcus.substack.com\/p\/how-o3-and-grok-4-accidentally-vindicated?r=8tdk6\" rel=\"nofollow noopener\" target=\"_blank\">neurosymbolic systems<\/a> and <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/generative-ais-crippling-and-widespread?r=8tdk6&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false\" rel=\"nofollow noopener\" target=\"_blank\">world models<\/a>) probably never will. Instead, the technology remains premature. We still regularly see insane, discomprehending dialogs like this example sent to me yesterday (and generated yesterday) by a friend. (Spoiler alert: in reality, <a href=\"https:\/\/futurism.com\/chatgpt-haywire-seahorse-emoji\" rel=\"nofollow noopener\" target=\"_blank\">there is no seahorse emoji<\/a>).<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!L_6v!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf7efd2-eea0-4544-a04f-afd6830376e9_1476x2060.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/1cf7efd2-eea0-4544-a04f-afd6830376e9_1476.jpeg\" width=\"1456\" height=\"2032\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/1cf7efd2-eea0-4544-a04f-afd6830376e9_1476x2060.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2032,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:356077,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/180115766?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf7efd2-eea0-4544-a04f-afd6830376e9_1476x2060.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>As they say in the military, \u201cfrequently wrong, never in doubt.\u201d <\/p>\n<p>Or to put it differently, ChatGPT is bit<a href=\"https:\/\/nofilmschool.com\/forrest-gump-box-of-chocolates-line\" rel=\"nofollow noopener\" target=\"_blank\"> like a box of chocolates, you never know what you are gonna get<\/a>. Which means, bluntly, that you can never really trust it.<\/p>\n<p>My verdict, in short, is that ChatGPT is a trillion dollar experiment that has failed. <\/p>\n<p>Even with more resources than almost any experiment in history, ChatGPT has failed to solve the core cognitive problems I stressed three years ago. And because of the discrepancy between reliability and the immense costs that stem from the inherent inefficiency of a system that it dependent on absorbing internet-scale data, it has thus far failed to make profits for companies like OpenAI.<\/p>\n<p>Small wonder Sutskever is urging people back to the drawing board.  <\/p>\n<p>\u00a7<\/p>\n<p>As I was drafting this, an engineer named <\/p>\n<p> was busy writing an assessment of LLMs of his own, a long, insightful essay entitled \u201c<a href=\"https:\/\/substack.com\/home\/post\/p-180206450\" rel=\"nofollow noopener\" target=\"_blank\">Why I am betting against AGI hype<\/a>\u201d. This bit encapsulates the situation well:<\/p>\n<p>When you query GPT-4, you\u2019re not getting a system that learns from your interaction and updates its understanding in real-time. You\u2019re getting sophisticated pattern-matching through a fixed network that was trained on historical data and then locked in place. The architecture can\u2019t modify itself based on what it\u2019s processing. It can\u2019t monitor its own reasoning and adjust strategy. It can\u2019t restructure its approach when it encounters something genuinely novel.<\/p>\n<p>That was never going to work. <\/p>\n<p> \u00a7<\/p>\n<p>If things do go belly-up, and the whole economy falls into recession, the single biggest culprit in my mind, will be ChatGPT\u2019s human avatar, its bullshit-spewing CEO Sam Altman, who hyped GPT-5 endlessly for years, pretending he knew that AGI was coming when in hindsight he was bluffing. In his January blog, for example, he wrote that \u201cWe are now confident we know how to build AGI as we have traditionally understood it.\u201d Earlier he joked that AGI had been \u201cachieved internally,\u201d likely stoking FOMO on the part of potential investors.<\/p>\n<p>Jensen Huang, CEO of Nvidia, might be culpable too, as he been increasingly drawn to overstatements and flawed arguments of his own, like this a couple weeks ago <a href=\"https:\/\/fortune.com\/company\/nvidia\/earnings\/q3-2026\/\" rel=\"nofollow noopener\" target=\"_blank\">at the last investor call<\/a>;<\/p>\n<p>There are three scaling laws that are scaling at the same time. The 1st scaling law called Pre-training continues to be very effective. And the 2nd is Post-training. Post-training basically has found incredible algorithms for improving an AI\u2019s ability to break a problem down and solve a problem step by step. And Post-training is scaling exponentially. Basically, the more compute you apply to a model, the smarter it is, the more intelligent it is. And then the 3rd is Inference. Inference, because of chain of thought, because of reasoning capabilities, AIs are essentially reading, thinking before it answers. The amount of computation necessary as a result of those three things has gone completely exponential.<\/p>\n<p>It all sounds great. But the last line really just shows that the demand on Nvidia chips (compute) \u2013 right now \u2013 is going up exponentially, not that it will continue to do so. Worse, it doesn\u2019t prove that its premise\u2014that exponentially increasing compute\u2014will actually bring us to artificial general intelligence. He\u2019s conflating costs going up exponentially with benefits going up exponentially.<\/p>\n<p>In reality, the Census Bureau data already shows signs of demand tapering off, which is not consistent with a genuine exponential. And if Sustekver and I (and at this point many others) are right that scaling is largely played out, a decline in demand is sure to follow.<\/p>\n<p>The hype game isn\u2019t new; I am old enough to remember how driverless cars were similarly hyped, with immense investments of its own, few of which have panned out. Max Chafkin nailed that mania at its peak in 2022, in an essay called <a href=\"https:\/\/www.bloomberg.com\/news\/features\/2022-10-06\/even-after-100-billion-self-driving-cars-are-going-nowhere?embedded-checkout=true\" rel=\"nofollow noopener\" target=\"_blank\">Even after 100 billion [in investment] self driving cars are going nowhere<\/a>. <\/p>\n<p>Three years later Waymo is further along, but still not making a profit; many driverless car start-ups went out of business. And Waymo\u2019s solution is still fragile, a demo is available only in a tiny percentage of the world\u2019s cities, a far cry from the ubiquity that Google\u2019s Co-CEO once promised would be achieved by 2017. And don\u2019t get me started about Elon Musk\u2019s long history of blown deadlines on driverless cars.<\/p>\n<p>GenAI is like a rerun of that situation, an order or two of magnitude more hype, coupled with equal na\u00efvet\u00e9 about what a cognitively adequate solution would actually look like, only this time with an extra decimal place, potentially a trillion dollars down the drain \u2014 and perhaps with far more collateral damage to the economy. <\/p>\n<p>Making it all more potent has been the <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/five-signs-that-generative-ai-is?r=8tdk6&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false\" rel=\"nofollow noopener\" target=\"_blank\">trillion pound baby fallacy<\/a>, of assuming that all exponential extrapolations will hold indefinitely when in reality most don\u2019t. AI researcher and future CMU professor <a href=\"http:\/\/file:\/\/\/var\/mobile\/Library\/SMS\/Attachments\/57\/07\/23846711-0850-47C3-87F2-EF39AF64EB92\/Screenshot%202025-11-28%20at%205.17.51%E2%80%AFPM.png\" rel=\"nofollow\">Niloofar Mireshghalleh<\/a> sent me some fascinating new (and not yet published) results last night, comparing progress across three AI benchmarks. One (a math test on which one might plausibly generate heaps of synthetic data) looked like it might be genuinely exponential. Another (SWE-Benech, focused on coding) looked like it had perhaps tailed off after initial exponential progress into a point of diminishing returns. A third, <a href=\"https:\/\/arxiv.org\/pdf\/2310.17884\" rel=\"nofollow noopener\" target=\"_blank\">on a complex task combining theory of mind and privacy that seems harder to game<\/a>, seemed to show a different pattern: progress that seemed slower and more linear:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!6BeK!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90de6678-cf1b-4a69-ba6f-f5e6807f2d7c_4158x2368.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/90de6678-cf1b-4a69-ba6f-f5e6807f2d7c_4158.jpeg\" width=\"1456\" height=\"829\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/90de6678-cf1b-4a69-ba6f-f5e6807f2d7c_4158x2368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:561486,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/180115766?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90de6678-cf1b-4a69-ba6f-f5e6807f2d7c_4158x2368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>Anyone assuming that everything is simultaneously rising exponentially, and that it will continue to do so indefinitely, is a fool. <\/p>\n<p>\u00a7<\/p>\n<p>Media boosters have played a big role in getting us here, too, often amplifying the tech CEO\u2019s fantasies rather than challenging them. Steven Witt wrote in the New Yorker \u201c<a href=\"https:\/\/www.newyorker.com\/newsletter\/the-daily\/will-ai-destroy-the-planet%20planet\" rel=\"nofollow noopener\" target=\"_blank\">I think ChatGPT is maybe the most successful internet product in history.<\/a>\u201d, overlooking a series of product like Amazon.com, Facebook, and Google\u2019s search engine that have seen broader adoption and far greater profits.  Ezra Klein has recently been telling us that AGI was likely \u201c<a href=\"https:\/\/www.nytimes.com\/2025\/03\/04\/opinion\/ezra-klein-podcast-ben-buchanan.html?smid=nytcore-ios-share\" rel=\"nofollow noopener\" target=\"_blank\">coming in two to three years, during Donald Trump\u2019s second term<\/a>\u201d alluding to industry sources (with vested interests). As discussed here last week, that speedy timeline was recently<a href=\"https:\/\/garymarcus.substack.com\/p\/breaking-the-ai-2027-doomsday-scenario?r=8tdk6\" rel=\"nofollow noopener\" target=\"_blank\"> walked back by its leading advocate<\/a>, and <a href=\"https:\/\/www.nytimes.com\/2025\/03\/04\/opinion\/ezra-klein-podcast-ben-buchanan.html?smid=nytcore-ios-share\" rel=\"nofollow noopener\" target=\"_blank\">refuted in an essay by Yoshua Bengio and 30 other authors <\/a>including myself.  <\/p>\n<p>Likewise, NYT columnist Kevin Roose wrote in February 2023 that he felt a \u201c<a href=\"https:\/\/www.nytimes.com\/2023\/02\/08\/technology\/microsoft-bing-openai-artificial-intelligence.html?smid=nytcore-ios-share\" rel=\"nofollow noopener\" target=\"_blank\">sense of awe<\/a> [when he] started using the new, [ChatGPT]-powered Bing\u201d. (Anybody remember Bing?) More recently, Roose promised \u201cwith a few keystrokes, amateurs can now build products that would have previously required teams of engineers\u201d, which has thus far not come close to true. (Amateurs can in fact now build prototypes, but the auto-generated code tends to be shaky and few amateurs can make those prototypes robust.) Too much industry hype is repeated with too little skepticism, driving up market valuations on analyses that often feel to me superficial.  <\/p>\n<p>\u00a7<\/p>\n<p>The market is maybe finally catching on. Nvidia\u2019s stock price, which had been rising at incredible speed for the last few years, fell 16% in November; CoreWeave (which traffics in Nvidia chips, fell by nearly half. And Oracle, which rocketed on the news that it had made a deal with OpenAI, dropped by about 26% in November, and (close to 50% since September 11 when I declared the response to that deal to be <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/peak-bubble?r=8tdk6&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false\" rel=\"nofollow noopener\" target=\"_blank\">\u201cpeak bubble\u201d<\/a>.<\/p>\n<p>OpenAI itself isn\u2019t traded on the open market, but if it had, it\u2019s likely that it, too, would have fallen over the last month, for all the reasons I described, and one more: they don\u2019t have moat. ChatGPT was pretty easily replicated; there wasn\u2019t all that much secret sauce. In March 2024 <a href=\"https:\/\/x.com\/GaryMarcus\/status\/1766871625075409381?s=20\" rel=\"nofollow\">I warned that they would lead to pileup of similar models that were short of many fantasies about GPT-5 and hence to price wars<\/a> with little profit, And so far it has; large language models have become a commodity; LLM profits outside of Nvidia are hard to find. <a href=\"https:\/\/techstartups.com\/2025\/10\/31\/openai-is-hemorrhaging-billions-microsoft-filing-reveals-openai-lost-11-5-billion-last-quarter-amid-ai-bubble-hype\/\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI itself is losing billions every month<\/a>. But worse than that, from OpenAI\u2019s perspective, Google seems in the last month to have overtaken them, with Gemini 3, which by many benchmarks is better than GPT-5. Google\u2019s stock is up; if OpenAI had stock, I imagine it would be down. Sam Altman himself acknowledged (in a leaked memo) that the company was <a href=\"https:\/\/www.theinformation.com\/articles\/openai-ceo-braces-possible-economic-headwinds-catching-resurgent-google?utm_source=ti_app&amp;rc=dcf9pt\" rel=\"nofollow noopener\" target=\"_blank\">facing \u201ctemporary economic headwinds\u201d<\/a>. <\/p>\n<p>The reality is OpenAI has ridden to a $500 billion valuation on hopes, not profits.  And those hopes are slipping away, as usage declines and others catch up.<\/p>\n<p>In the end LLMs will continue to exist, no matter what; they will get cheaper. And they will find some utility (indeed they are already have). But they will never be AGI, and they never be much more than commodity. <\/p>\n<p>\u00a7<\/p>\n<p>What is ChatGPT best for? The natural home for ChatGPT turns out to be\u2026 demos. At its core, it approximates past data, albeit with a superficial understanding. In many domains. answers might be like 80% correct and 20% absurd, but without a whole lot of task-specific engineering required. In some domains, like autocomplete for coders, who are in the loop and constantly testing what they write, that\u2019s fine. <\/p>\n<p>Ultimately though, what LLMs are great at seeming is plausible \u2013 while being much less good at getting their facts straight. <\/p>\n<p>The gulf between how impressive something *seems* and how useful something *is* has never been greater.<\/p>\n<p>\u00a7<\/p>\n<p>Although ChatGPT became famous three years ago, the core technology behind ChatGPT isn\u2019t actually three years old;  depending on how you count, it\u2019s more like 7 or 8. The core technology, known as the Transformer, was developed by Google in 2017. The first version of GPT (short for generative pretrained transformer), known now as <a href=\"https:\/\/en.wikipedia.org\/wiki\/GPT-1\" rel=\"nofollow noopener\" target=\"_blank\">GPT-1 <\/a> was released in June 2018. OpenAI and others have been steadily working at scaling that technology ever since. (People were writing glorifying articles about GPT-3 as far back as <a href=\"https:\/\/www.theguardian.com\/commentisfree\/2020\/sep\/08\/robot-wrote-this-article-gpt-3\" rel=\"nofollow noopener\" target=\"_blank\">2020.<\/a>)<\/p>\n<p>In some ways, GPT-5 is more capable than most 7 or 8 year olds. It\u2019s far more likely to be able to give the answer to random trivia question (such as who was the first of Henry VIII\u2019s wives), and far more likely to be able to write sound python code. <\/p>\n<p>In others, though, there is still something really fundamentally missing. ChatGPT obviously lacks the reliability of calculators (which is something the field should aspire to), but more than that the hallucinations and occasional bizaare errors should be shocking given that they (unlike seven-years-olds) have swallowed the entire internet. As Sutskever said last week, LLMs \u201csomehow just generalize dramatically worse than people. And it\u2019s super obvious. That seems like a very fundamental thing\u201d \u2013 perhaps an epitaph for the ChatGPT era (and not coincidentally, what I have been trying to say all along).<\/p>\n<p>More than that, any normal seven-year old develops a rich model of the world, including the objects they see, the people they interact with and so on, and even how their bodies work. GPT is still just faking it.  <\/p>\n<p>I wrote this <a href=\"https:\/\/www.nytimes.com\/2017\/07\/29\/opinion\/sunday\/artificial-intelligence-is-stuck-heres-how-to-move-it-forward.html\" rel=\"nofollow noopener\" target=\"_blank\">about my daughter in July 2017<\/a>, eight years ago, in a New York Times op-ed, just before ChatGPT was on the scene, when my daughter was 3. <\/p>\n<p>Although the field of A.I. is exploding with microdiscoveries, progress toward the robustness and flexibility of human cognition remains elusive.<\/p>\n<p>Not long ago, for example, while sitting with me in a cafe, my 3-year-old daughter spontaneously realized that she could climb out of her chair in a new way: backward, by sliding through the gap between the back and the seat of the chair. My daughter had never seen anyone else disembark in quite this way; she invented it on her own &#8211; and without the benefit of trial and error, or the need for terabytes of labeled data.<\/p>\n<p>Presumably, my daughter relied on an implicit theory of how her body moves, along with an implicit theory of physics \u2014 how one complex object travels through the aperture of another. I challenge any robot to do the same.<\/p>\n<p>A.I. systems tend to be passive vessels, dredging through data in search of statistical correlations; humans are active engines for discovering how things work.<\/p>\n<p>People are now very actively trying to put variations of large language models into humanoid robots, but I can\u2019t imagine any could do what my daughter was able to do then, let alone what she can do nowadays as a curious, precocious eleven-year-old. <\/p>\n<p>If you doubt me, or just want a laugh, check out reviews by <a href=\"https:\/\/youtu.be\/f3c4mQty_so?si=TXo_sZTKKpk9fOI_\" rel=\"nofollow noopener\" target=\"_blank\">Joanna Stern<\/a> and <a href=\"https:\/\/youtu.be\/j31dmodZ-5c?si=-YdSIXZg58ScvoUt\" rel=\"nofollow noopener\" target=\"_blank\">Marques Brownlee<\/a> of the forthcoming NEO humanoid robot, which remain slow and heavily dependent on human teleoperations, or check out the poor robot that <a href=\"https:\/\/youtu.be\/UuUSR8TyZDE?si=HpRm-rv57Q7gt3OR\" rel=\"nofollow noopener\" target=\"_blank\">face-planted at its Moscow debut earlier this month<\/a>, seconds after its unveiling.<\/p>\n<p>\u00a7<\/p>\n<p>For the love of Darwin, please let\u2019s spend the next eight years considering new approaches, not tanking the economy on more of the same. <\/p>\n<p>ChatGPT has had a good (though not great) run; it\u2019s time for something new. I am for it. Ilya is for it. The time has come.<\/p>\n","protected":false},"excerpt":{"rendered":"Three years ago, on November 30, 2022, ChatGPT was released. It\u2019s been one of the fastest-growing consumer products&hellip;\n","protected":false},"author":2,"featured_media":166449,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[220,218,219,61,60,80],"class_list":{"0":"post-166448","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ie","12":"tag-ireland","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/166448","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=166448"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/166448\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/166449"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=166448"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=166448"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=166448"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}