{"id":502282,"date":"2026-03-04T01:40:16","date_gmt":"2026-03-04T01:40:16","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/502282\/"},"modified":"2026-03-04T01:40:16","modified_gmt":"2026-03-04T01:40:16","slug":"llms-can-unmask-pseudonymous-users-at-scale-with-surprising-accuracy","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/502282\/","title":{"rendered":"LLMs can unmask pseudonymous users at scale with surprising accuracy"},"content":{"rendered":"<p>\u201cWhat we found is that these AI agents can do something that was previously very difficult: starting from free text (like an anonymized interview transcript) they can work their way to the full identity of a person,\u201d Simon Lermen, a co-author of the paper, told Ars. \u201cThis is a pretty new capability; previous approaches on re-identification generally required structured data, and two datasets with a similar schema that could be linked together.\u201d<\/p>\n<p>Unlike those older pseudonymity-stripping methods, Lermen said, AI agents can browse the web and interact with it in many of the same ways humans do. They can use simulated reasoning to match potential individuals. In one experiment, the researchers looked at responses given in a questionnaire Anthropic took about how various people use AI in their daily lives. Using the information taken from answers, the researchers were able to positively identify 7 percent of 125 participants.<\/p>\n<p>            <a class=\"cursor-zoom-in\" data-pswp-width=\"2462\" data-pswp-height=\"880\" data-cropped=\"false\" href=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2026\/03\/results-from-questionaire.jpg\" target=\"_blank\" data-pswp- rel=\"nofollow noopener\"><br \/>\n              <img width=\"640\" height=\"229\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2026\/03\/results-from-questionaire-640x229.jpg\" class=\"center medium\" alt=\"Column 1: Q: How did you use Al tools in a recent research project? A: I work in biology, on research related to [research topic]. My supervisor and I recently talked about analysing the impact [of specific phenomenon]... My background is in physical science... A: I used Al tools frequently... for writing [specific library] code 2nd collum Profile: \u2022 Computational biology, [subfield] \u2022 Education: physical science background \u2022 Likely PhD student or postdoc \u2022 Tools: Python, [specific library] \u2022 British English (\" analysing=\"\" uk=\"\" or=\"\" commonwealth=\"\" third=\"\" collumn:=\"\" phd=\"\" student=\"\" in=\"\" biology=\"\" research=\"\" subfield=\"\" preprint=\"\" methodology=\"\" profile=\"\" v=\"\" uk-based=\"\" using=\"\" library=\"\" repo=\"\" decoding=\"async\" loading=\"lazy\"  \/><br \/>\n            <\/a><\/p>\n<p>\n              End-to-end deanonymization from a single interview transcript (with details altered to protect the subject\u2019s identity). An LLM agent extracted structured identity signals from a conversation, autonomously searched the web to identify a candidate individual, and verified the candidate matched all extracted claims.\n                          <\/p>\n<p>\n      End-to-end deanonymization from a single interview transcript (with details altered to protect the subject\u2019s identity). An LLM agent extracted structured identity signals from a conversation, autonomously searched the web to identify a candidate individual, and verified the candidate matched all extracted claims.<\/p>\n<p>While a 7 percent recall is relatively low, it demonstrates the growing capability of AI to identify people based on very general information they gave. \u201cThe fact that AI can do this at all is a noteworthy result,\u201d Lermen said. \u201cAnd as AI systems get better, they will likely get better at finding more and more identities.\u201d<\/p>\n<p>In a second experiment, the researchers gathered comments made in 2024 from the r\/movies subreddit and at least one of five smaller communities: r\/horror, r\/MovieSuggestions, r\/Letterboxd, r\/TrueFilm, and r\/MovieDetails. The results showed that the more movies a candidate discussed, the easier it was to identify them. An average of 3.1 percent of users sharing one movie could be identified with a 90 percent precision, and 1.2 percent of them at a 99 percent precision. With five to nine shared movies, 90 percent and 99 percent precision rose to 8.4 percent and 2.5 percent of users, respectively. More than 10 shared movies bumped the percentage to 48.1 percent and 17 percent.<\/p>\n<p>            <a class=\"cursor-zoom-in\" data-pswp-width=\"1354\" data-pswp-height=\"954\" data-cropped=\"false\" href=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2026\/03\/Recall-at-precision-thresholds.png\" target=\"_blank\" data-pswp- rel=\"nofollow noopener\"><br \/>\n              <img width=\"640\" height=\"451\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2026\/03\/Recall-at-precision-thresholds-640x451.png\" class=\"center medium\" alt=\"\" decoding=\"async\" loading=\"lazy\"  \/><br \/>\n            <\/a><\/p>\n<p>\n              Recall at various precision thresholds.\n                          <\/p>\n<p>\n      Recall at various precision thresholds.<\/p>\n<p>In a third experiment, the researchers took a set of 5,000 Reddit users. The researchers added 5,000 \u201cdistraction\u201d identities of Reddit users to the candidate pool. The researchers compared their method to the older Netflix prize attack. They then added to the list of 10,000 candidate profiles 5,000 query distractors comprising users who appear only in a query set, with no true match in the candidate pool.<\/p>\n","protected":false},"excerpt":{"rendered":"\u201cWhat we found is that these AI agents can do something that was previously very difficult: starting from&hellip;\n","protected":false},"author":2,"featured_media":502283,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[182,181,507,74],"class_list":{"0":"post-502282","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/502282","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=502282"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/502282\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/502283"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=502282"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=502282"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=502282"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}