{"id":101433,"date":"2025-08-28T05:42:10","date_gmt":"2025-08-28T05:42:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/101433\/"},"modified":"2025-08-28T05:42:10","modified_gmt":"2025-08-28T05:42:10","slug":"dta-trials-ai-to-assist-digital-marketplace-application-reviews","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/101433\/","title":{"rendered":"DTA trials AI to assist Digital Marketplace application reviews"},"content":{"rendered":"<p>The federal government is set to pilot the use of artificial intelligence to help review applications for its Digital Marketplace 2 panel.<\/p>\n<p>                                <img loading=\"lazy\" decoding=\"async\" id=\"ContentPlaceHolder1_ucArticle_imgImage\" width=\"748\" height=\"420\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/1756359730_609_ImageResizer.ashx\" alt=\"DTA trials AI to assist Digital Marketplace application reviews\"\/><\/p>\n<p>The Digital Transformation Agency (DTA) has developed a proof-of-concept using a large language model to review an IT supplier\u2019s application case study in partnership with an assessment officer.<\/p>\n<p>The agency plans to expand the proof-of-concept into a pilot, with the goal of going live later this year using an AI\u2013human pairing model, rather than AI-based assessment alone.<\/p>\n<p>Launched in October 2024, DM2 is a government-wide procurement arrangement for IT labour hire, as well as professional and consulting services.<\/p>\n<p>Speaking during the AI Government Showcase in Canberra, former DTA principal technology advisor Ben Bildstein said the marketplace draws around 20,000 applications.<\/p>\n<p>\u201cThis is a really big and important piece of work,\u201d he said.<\/p>\n<p>\u201cWe&#8217;ve got all these people doing all this work, and we think maybe AI can rate applications.\u201d<\/p>\n<p>After reviewing the government\u2019s procurement policies and standards, as well AI ethics guidelines, this idea that AI could to it completely was ruled out.<\/p>\n<p>\u201cPretty simply, AI can&#8217;t do that,\u201d Bildstein said. \u201cIt can&#8217;t evaluate an application in a procurement context for you &#8211; that&#8217;s a human&#8217;s job.\u201d<\/p>\n<p>However, the agency agreed to trial AI on supplier case studies, which are typically assessed by two human reviewers to evaluate prior work.<\/p>\n<p>\u201cWe have the people rate that case study from one to five,\u201d explained Bildstein.<\/p>\n<p>\u201cWe&#8217;ve got two staff members doing that independently. If they agree with a margin of error of one point, we basically consider that to be sufficient.<\/p>\n<p>\u201cIf they agree, the case is passed for [an additional] review by a delegate. If they disagree, it goes to a discussion with a third person.\u201d<\/p>\n<p>\u201cSo, can AI read a case study and give a rating from one to five? The answer is yes, of course.&#8221;<\/p>\n<p>Three metrics<\/p>\n<p>The proof-of-concept tested 268 previous applications, comparing the AI model\u2019s assessments with those made by two human case officers.<\/p>\n<p>The testing used three metrics to assess how well the AI model performed when compared with human assessors in evaluating case studies.<\/p>\n<p>The first of these was the agreement rate between two humans compared to one person and an AI.<\/p>\n<p>According to Bildstein, the two case workers agree on average 81 percent of the time, which stood as a benchmark for assessing the AI\u2019s performance.<\/p>\n<p>In contrast, the AI agreed with a human 84 percent of the time.<\/p>\n<p>\u201cWe still have a 16 percent disagreement here,\u201d Bildstein noted.<\/p>\n<p>\u201cIn those cases, we throw away the AI and we get two humans to evaluate, like [the DTA] always did, and proceed on that basis.<\/p>\n<p>\u201cBut in the majority of cases, the AI agreed with the human.\u201d<\/p>\n<p>The other metric used was the average rating difference or margin of error.<\/p>\n<p>\u201cThe idea here is: we&#8217;ve got a human rating something out of five. We&#8217;ve got a second human rate and something out of five. You look at the difference between those two humans rating in an application. On average, how much do they disagree by?\u201d<\/p>\n<p>With two human assessors, the average disagreement score was 0.92, while the disagreement between a human and the AI was 0.76 \u2014 meaning the AI\u2019s ratings are closer to a human\u2019s than humans are to one another.<\/p>\n<p>\u201cSo, we\u2019re getting a little bit more consistency with a human and an AI,\u201d Bildstein added.<\/p>\n<p>The last metric was correlation, which measures how similarly two raters ranked case studies overall, or as Bildstein explained: &#8220;If the first person gives a high score, is the second person likely to do the same?\u201d<\/p>\n<p>The next stage will see the DTA use a larger data set of 6448 applications to provide more statistical weight to the preliminary results.<\/p>\n<p>Bildstein added that there were \u201cfurther governance and assurance boxes to tick\u201d, but that the model could potentially be live for the next marketplace round.<\/p>\n<p>As some parting advice to the Canberra audience, Bildstein said: \u201cThese days, AI is in fact the easy part.<\/p>\n<p>\u201cI would say put some real effort into your AI assurance early on because that\u2019s probably where you will spend most of your time.<\/p>\n<p>&#8220;And decide what you\u2019re going to measure; be really clear what good looks like and what is good enough.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"The federal government is set to pilot the use of artificial intelligence to help review applications for its&hellip;\n","protected":false},"author":2,"featured_media":101434,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-101433","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/101433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=101433"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/101433\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/101434"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=101433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=101433"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=101433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}