{"id":334615,"date":"2025-12-25T08:14:20","date_gmt":"2025-12-25T08:14:20","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/334615\/"},"modified":"2025-12-25T08:14:20","modified_gmt":"2025-12-25T08:14:20","slug":"gemini-3-flash-is-smart-but-when-it-doesnt-know-it-makes-stuff-up-anyway","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/334615\/","title":{"rendered":"Gemini 3 Flash is smart \u2014 but when it doesn\u2019t know, it makes stuff up anyway"},"content":{"rendered":"<p>Gemini 3 Flash often invents answers instead of admitting when it doesn\u2019t know somethingThe problem arises with factual or high\u2011stakes questionsBut it still tests as the most accurate and capable AI model<\/p>\n<p id=\"c1b28371-bdc1-40ad-b49d-5e8110567883\">Gemini 3 Flash is fast and clever. But if you ask it something it doesn\u2019t actually know \u2013 something obscure or tricky or just outside its training \u2013 it will almost always try to bluff its way through, according to a recent evaluation from the independent testing group Artificial Analysis.<\/p>\n<p>It seems Gemini 3 Flash hit 91% on the \u201challucination rate\u201d portion of the AA-Omniscience benchmark. That means when it didn\u2019t have the answer, it still gave one anyway, almost all the time, one that was entirely fictional.<\/p>\n<p><a id=\"elk-seasonal\" class=\"paywall\" aria-hidden=\"true\" data-url=\"\" href=\"\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"\/><\/p>\n<p id=\"c1b28371-bdc1-40ad-b49d-5e8110567883-2\">AI chatbots making things up has been an issue since they first debuted. Knowing when to stop and say I don&#8217;t know is just as important as knowing how to answer in the first place. Currently, <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.techradar.com\/tag\/google\" data-auto-tag-linker=\"true\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.techradar.com\/tag\/google\" rel=\"nofollow noopener\" target=\"_blank\">Google<\/a> Gemini 3 Flash AI doesn\u2019t do that very well. That&#8217;s what the test is for: seeing whether a model can differentiate actual knowledge from a guess.<\/p>\n<p>You may like<\/p>\n<p>Lest the number distract from reality, it should be noted that Gemini\u2019s high hallucination rate doesn\u2019t mean 91% of its total answers are false. Instead, it means that in situations where the correct answer would be \u201cI don\u2019t know,\u201d it fabricated an answer 91% of the time. That\u2019s a subtle but important distinction, but one that has real-world implications, especially as Gemini is integrated into more products like Google Search.<\/p>\n<p lang=\"en\" dir=\"ltr\">Ok, it&#8217;s not only me. Gemini 3 Flash has a 91% hallucination rate on the Artificial Analysis Omniscience Hallucination Rate benchmark!?Can you actually use this for anything serious?I wonder if the reason Anthropic models are so good at coding is that they hallucinate much\u2026 https:\/\/t.co\/b3CZbX9pHw pic.twitter.com\/uZnF8KKZD4<a href=\"https:\/\/twitter.com\/cantworkitout\/status\/2001594901713080675\" data-url=\"https:\/\/twitter.com\/cantworkitout\/status\/2001594901713080675\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">December 18, 2025<\/a><\/p>\n<p id=\"0b8c3754-1864-4c86-9f33-85fcdec269ae\">This result doesn&#8217;t diminish the power and utility of Gemini 3. The model remains the highest-performing in general-purpose tests and ranks alongside, or even ahead of, the latest versions of ChatGPT and Claude. It just errs on the side of confidence when it should be modest.<\/p>\n<p>The overconfidence in answering crops up with Gemini&#8217;s rivals as well. What makes Gemini\u2019s number stand out is how often it happens in these uncertainty scenarios, where there\u2019s simply no correct answer in the training data or no definitive public source to point to.<\/p>\n<p><a id=\"elk-ecc76a38-2b90-4ecc-8c95-b912b11bac09\" class=\"paywall\" aria-hidden=\"true\" data-url=\"\" href=\"\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"\/>Hallucination Honesty<\/p>\n<p id=\"c1658711-3fcc-4bd1-bc50-c1effe967f79\">Part of the issue is simply that generative AI models are largely word-prediction tools, and predicting a new word is not the same as evaluating truth. And that means the default behavior is to come up with a new word, even when saying &#8220;I don&#8217;t know&#8221; would be more honest.<\/p>\n<p class=\"newsletter-form__strapline\">Sign up for breaking news, reviews, opinion, top tech deals, and more.<\/p>\n<p>OpenAI has started addressing this and getting its models to recognize what they don\u2019t know and say so clearly. It\u2019s a tough thing to train, because reward models don\u2019t typically value a blank response over a confident (but wrong) one. Still, OpenAI has made it a goal for the development of future models.<\/p>\n<p>And Gemini does usually cite sources when it can. But even then, it doesn\u2019t always pause when it should. That wouldn\u2019t matter much if Gemini were just a research model, but as Gemini becomes the voice behind many Google features, being confidently wrong could affect quite a lot.<\/p>\n<p>There\u2019s also a design choice here. Many users expect their AI assistant to respond quickly and smoothly. Saying \u201cI\u2019m not sure\u201d or \u201cLet me check on that\u201d might feel clunky in a <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.techradar.com\/tag\/chatbot\" data-auto-tag-linker=\"true\" data-mrf-recirculation=\"inline-link\" data-before-rewrite-localise=\"https:\/\/www.techradar.com\/tag\/chatbot\" rel=\"nofollow noopener\" target=\"_blank\">chatbot<\/a> context. But it\u2019s probably better than being misled. Generative AI still isn&#8217;t always reliable, but double-checking any AI response is always a good idea.<\/p>\n<p id=\"4a9efc83-85a8-4348-a4d3-053bcd751043\"><a data-analytics-id=\"inline-link\" href=\"https:\/\/news.google.com\/publications\/CAAqKAgKIiJDQklTRXdnTWFnOEtEWFJsWTJoeVlXUmhjaTVqYjIwb0FBUAE?hl=en-GB&amp;gl=GB&amp;ceid=GB%3Aen\" target=\"_blank\" data-url=\"https:\/\/news.google.com\/publications\/CAAqKAgKIiJDQklTRXdnTWFnOEtEWFJsWTJoeVlXUmhjaTVqYjIwb0FBUAE?hl=en-GB&amp;gl=GB&amp;ceid=GB%3Aen\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" data-mrf-recirculation=\"inline-link\" rel=\"nofollow noopener\">Follow TechRadar on Google News<\/a> and <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.google.com\/preferences\/source?q=techradar.com\" target=\"_blank\" data-url=\"https:\/\/www.google.com\/preferences\/source?q=techradar.com\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" data-mrf-recirculation=\"inline-link\" rel=\"nofollow noopener\">add us as a preferred source<\/a> to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!<\/p>\n<p>And of course you can also <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tiktok.com\/@techradar\" target=\"_blank\" data-url=\"https:\/\/www.tiktok.com\/@techradar\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" data-mrf-recirculation=\"inline-link\" rel=\"nofollow noopener\">follow TechRadar on TikTok<\/a> for news, reviews, unboxings in video form, and get regular updates from us on <a data-analytics-id=\"inline-link\" href=\"https:\/\/whatsapp.com\/channel\/0029Va6HybZ9RZAY7pIUK12h\" target=\"_blank\" data-url=\"https:\/\/whatsapp.com\/channel\/0029Va6HybZ9RZAY7pIUK12h\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" data-mrf-recirculation=\"inline-link\" rel=\"nofollow noopener\">WhatsApp<\/a> too.<\/p>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><script async src=\"\/\/www.tiktok.com\/embed.js\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"Gemini 3 Flash often invents answers instead of admitting when it doesn\u2019t know somethingThe problem arises with factual&hellip;\n","protected":false},"author":2,"featured_media":334616,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[554,733,4308,86,56,54,55],"class_list":{"0":"post-334615","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology","12":"tag-uk","13":"tag-united-kingdom","14":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/334615","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=334615"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/334615\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/334616"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=334615"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=334615"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=334615"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}