{"id":162443,"date":"2025-09-26T22:41:09","date_gmt":"2025-09-26T22:41:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/162443\/"},"modified":"2025-09-26T22:41:09","modified_gmt":"2025-09-26T22:41:09","slug":"what-drives-the-quality-of-internet-searches","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/162443\/","title":{"rendered":"What drives the quality of internet searches"},"content":{"rendered":"<p><img decoding=\"async\" src=\"data:image\/svg+xml,%3Csvg%20xmlns=\" http:=\"\" data-lazy-src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/07\/play-icon.png\"\/> Listen to this article<\/p>\n<p><a href=\"https:\/\/rbj.net\/files\/2023\/05\/Guest-Op-Amit-Batabyal.png\" rel=\"nofollow noopener\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-medium wp-image-412519\" src=\"data:image\/svg+xml,%3Csvg%20xmlns=\" http:=\"\" alt=\"\" width=\"300\" height=\"154\" data-lazy- data-lazy- data-lazy-src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/Guest-Op-Amit-Batabyal-300x154.png\"\/><\/a>We all use search engines such as <a href=\"https:\/\/about.google\/\" rel=\"nofollow noopener\" target=\"_blank\">Google<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/edge\/?form=MA13FJ\" rel=\"nofollow noopener\" target=\"_blank\">Edge<\/a>, and <a href=\"https:\/\/duckduckgo.com\/\" rel=\"nofollow noopener\" target=\"_blank\">DuckDuckGo<\/a> to conduct all kinds of searches on the internet. In this regard, a key question is the following: Does the superior quality of search results from dominant firms like Google stem primarily from better algorithms or from access to larger volumes of user-generated data? This distinction has salient implications for <a href=\"https:\/\/www.britannica.com\/topic\/competition-policy\" rel=\"nofollow noopener\" target=\"_blank\">competition policy<\/a>, innovation incentives, and consumer welfare. Why? If algorithmic superiority explains dominance, then competitors can challenge incumbents by developing better technology. However, if access to vast amounts of user data is the key driver, then incumbents enjoy an enduring advantage in the sense that they are able to create de facto entry barriers that diminish incentives to innovate.<\/p>\n<p>Search engines rely heavily on data generated by users. Every query and subsequent click contribute to query logs, which help refine future search results. The challenge for researchers seeking to shed light on this \u201calgorithm vs. data access\u201d question lies in isolating the causal effect of data availability from that of algorithms. Usage data are proprietary, and the number of past searches is not exogenous and this complicates empirical analysis. To overcome these hurdles, <a href=\"https:\/\/www.tobiasklein.ws\/\" rel=\"nofollow noopener\" target=\"_blank\">Tobias Klein<\/a> and his colleagues collaborated with Cliqz, a small German search engine, to conduct a <a href=\"https:\/\/www.tse-fr.eu\/sites\/default\/files\/TSE\/documents\/sem2024\/eco_platforms\/klein.pdf\" rel=\"nofollow noopener\" target=\"_blank\">controlled experiment<\/a> and shed valuable light on the above \u201calgorithm vs. data access\u201d question.<\/p>\n<p>The study\u2019s design involved fixing the Cliqz algorithm while systematically varying the amount of user-generated data it could use to generate results. Queries were categorized into five \u201cbuckets\u201d by frequency\u2014from the most common to the rarest\u2014and a representative sample of 5,000 queries was used. Non-personalized search results from Cliqz, Google, and Bing were obtained for comparison. Human assessors, unaware of which engine generated the results, rated their quality using a 7-point <a href=\"https:\/\/www.sciencedirect.com\/topics\/medicine-and-dentistry\/likert-scale\" rel=\"nofollow noopener\" target=\"_blank\">Likert scale<\/a>.<\/p>\n<p>The findings reveal that for popular queries, Cliqz\u2019s results were comparable in quality to Google\u2019s and Bing\u2019s, suggesting that algorithmic differences are not the main determinant of search performance in these cases. However, for rare queries\u2014which constitute 74% of total traffic\u2014Cliqz\u2019s results were significantly worse. This shows that access to more user-generated data is crucial for improving the quality of results, particularly for less frequent searches.<\/p>\n<p>The experiment further demonstrated that <a href=\"https:\/\/www.britannica.com\/money\/diminishing-returns\" rel=\"nofollow noopener\" target=\"_blank\">diminishing returns<\/a> set in quickly for popular queries. In other words, once a moderate amount of data is available, additional data no longer improves quality. In contrast, rare queries exhibited no such saturation, with quality continuing to improve even as more data were added. A <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0304407613001668\" rel=\"nofollow noopener\" target=\"_blank\">robustness check<\/a> comparing the overlap between Cliqz\u2019s and Google\u2019s top five results confirmed the following finding: rare queries require more data for quality convergence.<\/p>\n<p>These results have significant policy implications. Since rare queries dominate traffic, without sufficient user data, new entrants face considerable disadvantages, reinforcing the status quo and possible monopoly power. This dynamic also supports arguments for mandatory data-sharing by firms, as envisioned by the <a href=\"https:\/\/commission.europa.eu\/strategy-and-policy\/priorities-2019-2024\/europe-fit-digital-age\/digital-markets-act-ensuring-fair-and-open-digital-markets_en\" rel=\"nofollow noopener\" target=\"_blank\">European Union\u2019s Digital Markets Act<\/a>. Because user data are <a href=\"https:\/\/corporatefinanceinstitute.com\/resources\/economics\/non-rivalrous-goods\/\" rel=\"nofollow noopener\" target=\"_blank\">non-rival<\/a>, sharing would not reduce incumbents\u2019 quality but would erode their exclusivity advantage, potentially restoring competition and incentives to innovate.<\/p>\n<p>The <a href=\"https:\/\/www.tse-fr.eu\/sites\/default\/files\/TSE\/documents\/sem2024\/eco_platforms\/klein.pdf\" rel=\"nofollow noopener\" target=\"_blank\">research<\/a> I am discussing here is valuable because it provides rare experimental evidence on the causal link between user data and search quality. While prior research has established correlations between data scale and performance, this study demonstrates directly that user-generated data are indispensable for competing with incumbents. In addition, the findings underscore the salient point that the biggest quality gaps between dominant and smaller search engines arise not from algorithms but from disparities in access to data.<\/p>\n<p>In sum, without data sharing, market entry will remain difficult, reducing competition and innovation. Even as emerging technologies like <a href=\"https:\/\/www.ibm.com\/think\/topics\/large-language-models\" rel=\"nofollow noopener\" target=\"_blank\">large language models<\/a> improve algorithms, the need for sufficient user data persists. Put differently, effective regulation ensuring data access can help level the playing field and thereby enhance consumer welfare in digital search markets.<\/p>\n<p>Batabyal is a Distinguished Professor, the Arthur J. Gosnell professor of economics, and the Head of the Sustainability Department, all in the Rochester Institute of Technology, but these views are his own. <\/p>\n<p>&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"Listen to this article We all use search engines such as Google, Edge, and DuckDuckGo to conduct all&hellip;\n","protected":false},"author":2,"featured_media":162444,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[1638,86,56,54,55],"class_list":{"0":"post-162443","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-internet","8":"tag-internet","9":"tag-technology","10":"tag-uk","11":"tag-united-kingdom","12":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/162443","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=162443"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/162443\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/162444"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=162443"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=162443"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=162443"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}