{"id":10768,"date":"2025-07-21T11:43:19","date_gmt":"2025-07-21T11:43:19","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/10768\/"},"modified":"2025-07-21T11:43:19","modified_gmt":"2025-07-21T11:43:19","slug":"hands-on-with-the-new-chatgpt-agent-mode-mindblowing-with-a-side-of-hallucination","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/10768\/","title":{"rendered":"Hands on with the new ChatGPT agent mode: Mindblowing with a side of hallucination"},"content":{"rendered":"<p><img fetchpriority=\"high\" decoding=\"async\" class=\"size-full wp-image-882100 aligncenter\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/not-Shaun-Davies-1.jpg\" alt=\"\" width=\"570\" height=\"320\"  \/><\/p>\n<p>For the past couple of years, we\u2019ve been learning to treat AI like a clever tool\u2014a supercharged search engine or a brainstorming partner. With the release of ChatGPT Agent, OpenAI is asking us to change our thinking. This isn\u2019t a tool you wield, it\u2019s a digital colleague you brief. It takes your instructions and works autonomously in its own little virtual computer, sometimes with brilliant results, and sometimes by hallucinating your face onto a PowerPoint slide. It\u2019s a profound change in how we interact with AI, moving from operator to manager.<\/p>\n<p>And despite moments of absurdity and some privacy concerns, the experience of working with this weird digital colleague left a strong impression. Two days ago, I was in the agent-skeptic camp, convinced it would be some time before these tools could meaningfully impact my workflow. But after putting it through its paces, I\u2019m starting to think my Pro subscription might just be worth the A$300 a month.\u00a0<\/p>\n<p>The basics: How agent works<\/p>\n<p>To turn it on, you simply select Agent from the Tools menu. It\u2019s not available on a free plan, and there\u2019s a use limit on other plans. The Pro limit is really high at 500 runs per month.\u00a0<\/p>\n<p>Enter a prompt describing what you need and a mini-browser and terminal window appear attached to your chat box. You can watch it scroll, click, and run scripts in real time as it works through its plan \u2014 very occasionally asking for confirmation before taking a key step. What allows Agent to perform these tasks is that it\u2019s not just a language model: it\u2019s a language model that has been given its own toolkit and a virtual computer to run it on.<\/p>\n<p>ADVERTISEMENT<\/p>\n<p>OpenAI calls this a \u2018unified agentic system,\u2019 which is a fancy way of saying it combines the web-browsing skills of its earlier \u2018Operator\u2019 prototype with the synthesising power of its deep research tools and ChatGPT\u2019s conversational smarts. When you switch to Agent mode, you\u2019re giving ChatGPT access to a browser, a code interpreter, and file management systems. It can then use these tools to carry out your instructions. When it\u2019s working well, it\u2019s quite a thing to watch. You can see it carry out its operations in the virtual machine, or use the ellipsis (\u2026) button in the right-hand corner to toggle to an activity log or take over the browser. You can also access historical recordings of the virtual machine after each step completes. Here it is in action:<\/p>\n<\/p>\n<p>\u00a0<\/p>\n<p>Through \u2018Connectors\u2019, you can grant it permission to access your other applications. Invoking them is as simple as typing the name of the service, like \u2018Google Drive\u2019 or \u2018Hubspot\u2019, directly into the prompt. This is where it gets powerful. The Agent browses a public website for information, searches its own memory of developer documentation or browses the live web to figure out how to use a specific platform\u2019s API. It then writes and executes the code to get the job done. It\u2019s this ability to plan, use tools, and learn on the fly that separates it from a simple chatbot.<\/p>\n<p><a href=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/reading-from-hubspot-using-connector.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-882069\" class=\"wp-image-882069\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/reading-from-hubspot-using-connector.png\" alt=\"\" width=\"484\" height=\"396\"  \/><\/a><\/p>\n<p id=\"caption-attachment-882069\" class=\"wp-caption-text\">It\u2019s not the most exciting screengrab, but you can see Agent and Hubspot talking to each other (click to expand)<\/p>\n<p>OpenAI is aiming to get deep into your workflows and your operating system here. The ability to access all the files in your Google Drive and emails is really powerful, OpenAI is arguably the most data-hungry company on earth. While I really like this tool, I don\u2019t feel entirely comfortable with the level of access I am giving it \u2013 I\u2019ll talk about my concerns below.\u00a0<\/p>\n<p>On industry benchmarks like AgentBench and GAIA, which test agents on a range of common computer tasks, Agent achieves state-of-the-art scores. But benchmarks are one thing, and real-world utility is another. Which is why I put it to the test.<\/p>\n<p>Simple start: A BAS statement and an expense claim<\/p>\n<p>My first task for the Agent was simple: calculate the GST for my quarterly Business Activity Statement from a number of invoices. I gave it some general instructions on how to find the files on Google Drive and a few minutes later it returned a perfectly clean table, ready for the tax office. No drama, no fuss \u2014 a quiet win on its first outing.<\/p>\n<p>Next, I moved on to something more complex: creating an expense claim from a folder of image receipts. I gave it instructions on where to locate the Google Drive folder where the images and a template could be found. But despite having a connector, it struggled to see the Drive folder, prompting me to log in manually within its browser window. A bit clunky, but we got there \u2013 but this would prove a recurrent problem.\u00a0 Note that it seems at this point the Agent is not able to connect directly to files on your laptop \u2014 because your laptop doesn\u2019t have an API.<\/p>\n<p>After I fixed the file issue, Agent started to extract the amounts from the receipts, but struggled when the text was near the bottom of the image, as it couldn\u2019t get the zoom feature to focus there. I watched it laboriously zoom in on each receipt image, sometimes taking twenty attempts to correctly read a single line item at the bottom of a jpg. It was frustrating but I was impressed with its tenacity. Once it got past these hiccups, it did a good job, meticulously creating a table of expenses and double-checking its work.\u00a0<\/p>\n<\/p>\n<p>There was, however, a very funny mistake. Agent saw a receipt for a \u201cDeluxe Fried Chicken Sandwich\u201d, and interpreted it as a \u201cLava Fried Chicken Sandwich.\u201d Steve\u2019s Lava Chicken is a meme spawned from A Minecraft Movie due to a rambunctious 34-second Jack Black song that charted on the Billboard Top 100. I guess ChatGPT Agent is heavily brainrotted?<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-882079\" class=\"wp-image-882079\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/lava-fried-chicken.png\" alt=\"\" width=\"509\" height=\"349\"  \/><\/p>\n<p id=\"caption-attachment-882079\" class=\"wp-caption-text\">\u201cSteve\u2019s Lava Chicken, it\u2019s as tasty as hell\u2026\u201d Agent mislabeled my fried chicken sandwich as a Minecraft meme<\/p>\n<p>Right at the end, it made another odd decision: it exported the file as a PDF without asking. This was not what I wanted, and it was the first of many \u201cindependent decisions\u201d that were out of line with my needs\u2014a theme we\u2019ll return to.<\/p>\n<p>Deals, decks and hallucinated heads<\/p>\n<p>Next, I tested its analytical capabilities. \u201cSummarise my Hubspot pipeline,\u201d I asked. This was really impressive. It quietly read Hubspot\u2019s API documentation, authenticated itself, fetched my list of deals, and then, impressively, cross-referenced them with my Gmail to gather more context. The output was a clean, no-frills breakdown of stages, values, and next actions. It was the work of a competent analyst who just gets the job done.<\/p>\n<p>Here\u2019s part of the report, with sensitive details blacked out.\u00a0<\/p>\n<p><a href=\"https:\/\/mumbrella.com.au\/wp-content\/uploads\/2025\/07\/hubspot.png\" rel=\"nofollow noopener\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-882077\" class=\"wp-image-882077 size-large\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/hubspot-800x648.png\" alt=\"\" width=\"470\" height=\"381\"  \/><\/a><\/p>\n<p id=\"caption-attachment-882077\" class=\"wp-caption-text\">The report was useful, accurate and comprehensive, and I\u2019m going to schedule it to happen once a week<\/p>\n<p>The final challenge, creating a bilingual presentation in English and Japanese, was the highlight of the tests. After feeding it the Word documents and image assets, it spun up some Javascript and Python and produced a deck that was competent, pleasant to look at, and will probably save me 90% of the time I would have spent on it. The layouts were mostly intact, and the bilingual headings were accurate. It was magic.<\/p>\n<\/p>\n<p>But it took some pain to get there. On two occasions it inexplicably dropped out of Agent mode and reverted to ChatGPT 3.5 Pro, which is a great model but not right for this task. This change made Agent mode inaccessible, which forced me to waste time starting a new chat and copying over context.\u00a0<\/p>\n<p>But the biggest problem occurred when Agent struggled to find a folder of pictures and made a unilateral decision to generate its own instead. It proceeded to create not one, but two fake portraits of me as a generic consultant with a full head of hair.\u00a0<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-882072 aligncenter\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/also-not-me-2.png\" alt=\"\" width=\"424\" height=\"286\"  \/><\/p>\n<p>Aside from this odd decision, the experience quietly blew my mind. To summarise \u2013 I gave ChatGPT Agent access to some case studies, a high-level brief (in Japanese), and a deck to use as a template. It then went away and worked for half-an-hour, using Javascript and Python to generate slides, checked its work and adjusted the design, and outputted what is basically a usable deck in two languages. I read Japanese moderately well and the translation is solid. It\u2019s not perfect, but by the standards of even a year ago, producing a full deck this way is staggering.\u00a0<\/p>\n<p>Putting on my product manager hat\u00a0<\/p>\n<p>While ChatGPT Agent is a leap forward, it\u2019s still very much version one. And if OpenAI is listening (doubtful), I have some ideas for how to make it better.\u00a0<\/p>\n<p>I think there needs to be a way to control how often the model asks for help. One option would be a slide to control autonomy. Set it low, and Agent frequently asks for clarification before taking a creative leap, like inventing my face or exporting a file to a format I didn\u2019t ask for. Set it high, and it almost exclusively makes its own decisions. This would solve most of the frustrations of working with a tool that doesn\u2019t know when it\u2019s overstepping.<\/p>\n<p>There are also some basic technical frustrations that need ironing out. The issue of it failing to recognise files in a connected Google Drive is a significant one, as is the infuriating bug where it randomly jumps out of Agent mode and into the less-capable ChatGPT 3.5 model mid-task.<\/p>\n<p>Finally, a word on privacy. When you connect an app, you get a pop-up warning you that signing into websites can \u201cexpose your data to malicious sites.\u201d That\u2019s a good call-out \u2013 if your account is compromised and you\u2019ve attached all your data sources via Connectors, it would be a treasure trove for bad actors. Make sure you enable two-factor authentication (2FA) and take other precautions if you\u2019re going to use this. You also get this warning when you take over the browser to login to websites.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-882071 aligncenter\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/07\/warning.png\" alt=\"\" width=\"333\" height=\"398\"  \/><\/p>\n<p>What\u2019s less obvious is that OpenAI, a data-hungry monster, reserves the right to train its models on the data it can access through your connectors by default. If you value your privacy or have confidential information in your Drive, you need to navigate into Settings &gt; Data Controls and turn off \u201cImprove model for everyone\u201d. I really think this should be a much more prominent choice during onboarding, not a hidden default, but fat chance of that.\u00a0<\/p>\n<p>For marketers and media professionals, this is a tangible preview of the future. We are moving from telling our tools what to do, to briefing them on what to achieve. Agent is capable of both staggering feats of competence and amusing screw-ups. It\u2019s undeniably powerful, even if it does try to give you a new face.<\/p>\n<p>ChatGPT Agent is currently available for Pro users, and will be rolling out to Plus and Team users soon, before eventually coming to Enterprise customers.<\/p>\n","protected":false},"excerpt":{"rendered":"For the past couple of years, we\u2019ve been learning to treat AI like a clever tool\u2014a supercharged search&hellip;\n","protected":false},"author":2,"featured_media":10769,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[12770,256,12771,254,255,64,63,5004,12772,8668,5044,105],"class_list":{"0":"post-10768","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-agent-mode","9":"tag-ai","10":"tag-ai-automation","11":"tag-artificial-intelligence","12":"tag-artificialintelligence","13":"tag-au","14":"tag-australia","15":"tag-chatgpt","16":"tag-chatgpt-agent","17":"tag-hubspot","18":"tag-openai","19":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/10768","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=10768"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/10768\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/10769"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=10768"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=10768"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=10768"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}