{"id":132899,"date":"2025-09-04T18:49:10","date_gmt":"2025-09-04T18:49:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/132899\/"},"modified":"2025-09-04T18:49:10","modified_gmt":"2025-09-04T18:49:10","slug":"mcs-chicago-student-and-ican-alumnus-preston-firestone-authors-llms-conference-paper-siebel-school-of-computing-and-data-science","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/132899\/","title":{"rendered":"MCS Chicago student and iCAN alumnus Preston Firestone authors LLMs conference paper | Siebel School of Computing and Data Science"},"content":{"rendered":"<p>&#8220;It was impressive that Preston led this work as an\u00a0MCS student with a non-traditional background in computing.\u201d<\/p>\n<p>That\u2019s what Siebel School of Computing and Data Science in The Grainger College of Engineering at the University of Illinois Urbana-Champaign associate professor <a href=\"https:\/\/siebelschool.illinois.edu\/about\/people\/all-faculty\/misailo\" rel=\"nofollow noopener\" target=\"_blank\">Sasa Misailovic<\/a> said when asked about Preston Firestone and \u00a0<a href=\"https:\/\/icml.cc\/virtual\/2025\/47766\" rel=\"nofollow noopener\" target=\"_blank\">UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8<\/a> paper\u2019s acceptance at the upcoming <a href=\"https:\/\/colmweb.org\/index.html\" rel=\"nofollow noopener\" target=\"_blank\">2025 Conference on Language Modeling (COLM)<\/a>, which takes place from October 7 to 10, 2025, in Montreal, Canada.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/09\/viewphoto.aspx\" alt=\"Four headshots\" width=\"800\" data-fancy-caption=\"&lt;p&gt;(L to R) Preston Firestone, Sasa Misailovic, Gagandeep Singh, Shubham Ugare&lt;\/p&gt;\" loading=\"lazy\"\/>(L to R) Preston Firestone, Sasa Misailovic, Gagandeep Singh, Shubham Ugare<\/p>\n<p>Firestone is a <a href=\"https:\/\/siebelschool.illinois.edu\/academics\/graduate\/professional-mcs\" rel=\"nofollow noopener\" target=\"_blank\">Master of Computer Science<\/a> student in the <a href=\"https:\/\/siebelschool.illinois.edu\/academics\/graduate\/professional-mcs\/chicago-master-computer-science\" rel=\"nofollow noopener\" target=\"_blank\">MCS Chicago program<\/a>. He admits that,\u00a0\u201chonestly, I felt like I had gotten away with something.\u201d<\/p>\n<p>Misailovic says, \u201cPreston attended my CS 591 seminar on programming and AI safety in Chicago back in Spring 2024. He was looking to get research experience, and we figured out a possible idea where he could help my (then) PhD student, <a href=\"https:\/\/shubhamugare.github.io\/\" rel=\"nofollow noopener\" target=\"_blank\">Shubham Ugare<\/a>.\u201d<\/p>\n<p>Firestone is lead author of the paper, alongside Misailovic,\u00a0Meta research scientist Ugare (&#8217;25 Ph.D. Computer Science) and CS assistant professor <a href=\"https:\/\/siebelschool.illinois.edu\/about\/people\/all-faculty\/ggnds\" rel=\"nofollow noopener\" target=\"_blank\">Gagandeep Singh<\/a>.<\/p>\n<p>Firestone says of the research team, \u201cThey are allowing me to contribute to science as a whole, or at least to my professional development. My primary goal for the research work, however, was to see whether I&#8217;d enjoy it, and to learn what exactly the work consisted of: I&#8217;d like to have more information about what it would entail before committing to a doctoral program. And I discovered that I do, in fact, enjoy it!\u201d\u00a0<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/09\/1757011748_248_viewphoto.aspx\" alt=\"Blue and green squares with text surrounding the text Large Language Model (LLM).\" class=\"image align-right\" data-fancy-caption=\"&lt;p&gt;Large Language Models (LLMs)&lt;\/p&gt;&lt;p&gt;Photo Credit: &lt;em&gt;Arnab Dey \/ Adobe Stock&lt;\/em&gt;&lt;\/p&gt;\" width=\"400\" loading=\"lazy\"\/>\u201cWe have been working on a novel approach for constrained generation as part of our <a href=\"https:\/\/structuredllm.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Structured LLM<\/a> initiative, which controls large language models (LLMs) to generate text that conforms to user-defined rules,\u201d Misailovic explains. \u201cIn each step, an LLM generates tokens, which are typically words or several letters \u2014 parts of words; however, for non-Latin alphabets and some math notation, tokens can even be just a part of individual symbols. And as Preston was implementing the system we thought of, he encountered an unexpected problem.\u201d\u00a0<\/p>\n<p>\u201cPreston and Shubham identified that the problem appears when generating tokens for specific math formulas. Then they identified that the same issues also appear when generating human languages written in non-Latin scripts, including those written in Devanagari (for Indian texts), Cyrillic (for Slavic languages), and others.\u201d\u00a0<\/p>\n<p>He continues, \u201cPreston then decided to study and develop a new theoretical framework to explain that the current abstractions LLM developers use when processing text with LLMs are not sufficient to shelter us from problems with character encoding. In this work, he connects many threads from machine learning, linguistics, programming languages and theoretical computer science communities. He also studied the existing empirical techniques for fixing the problem and devised a way to fix the issue in our system.\u201d<\/p>\n<p>Firestone obtained his undergraduate degree from the University of St. Andrew\u2019s in Scotland, studying philosophy and theology. He says, \u201cAt the end of my bachelor&#8217;s, I decided to switch topics from philosophy and theology to computer science.\u201d<\/p>\n<p class=\"text-left\">Learning that Siebel School of Computing and Data Science in The Grainger College of Engineering at the University of Illinois Urbana-Champaign offers the <a href=\"https:\/\/siebelschool.illinois.edu\/academics\/graduate\/ican\" rel=\"nofollow noopener\" target=\"_blank\">Illinois Computing Accelerator for Non-Specialists<\/a> (iCAN) program, he decided to enroll. \u201cMy plan always was to complete a master&#8217;s, and after iCAN, only the MCS is possible without the special assistance of a sponsoring professor. I chose Chicago because I was already living there.\u201d<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/09\/1757011749_234_viewphoto.aspx\" alt=\"White text: Illinois Computing Accelerator for Non-Specialists (iCAN) with a photo of two women looking at a computer with an orange background.\" class=\"image align-center\" data-fancy-caption=\"&lt;p&gt;Illinois Computing Accelerator for Non-Specialists (iCAN) &lt;\/p&gt;\" width=\"500\" loading=\"lazy\"\/><\/p>\n<p>Firestone met Misailovic in Chicago.\u00a0 Firestone notes that \u201cthe special characteristic of the <a href=\"https:\/\/siebelschool.illinois.edu\/academics\/graduate\/professional-mcs\/chicago-master-computer-science\" rel=\"nofollow noopener\" target=\"_blank\">Chicago MCS program<\/a> is that the courses are quite small and one thereby has direct and personal access to the professors, while being supported in the background by a large research institution. This combination of liberal-arts-style intimacy with R1-level resources gives excellent opportunities to those who aggressively profit from the access afforded to the professors. Had Sasa not caught me in the office one Thursday after I&#8217;d emailed him asking for a recommendation, I wouldn&#8217;t be here now. And if it weren&#8217;t for the intimate scale of the Chicago program, Sasa might not have known who I was, much less taken a personal interest in me to offer me the position on the project. And had I not made the effort to be in the office to socialize with and encounter professors and students, I wouldn&#8217;t have been known to Sasa by face and name.\u00a0\u201c<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/09\/1757011750_269_viewphoto.aspx\" alt=\"\" class=\"image align-center\" data-fancy-caption=\"\" width=\"500\" loading=\"lazy\"\/><\/p>\n<p>Upon hearing of Firestone\u2019s interest in pursuing a PhD, Misailovic connected Firestone with Ugare. \u201cI had always planned to continue in academia after the MCS,\u201d Firestone recalls, \u201cso I began asking professors for recommendations for doctoral programs during my last semester in the MCS. Sasa asked me what was on my resume and, realizing it was insufficient to qualify me for a PhD program, assigned me to Shubham&#8217;s SynCode project as a software developer. Luckily, I had already taken CS421, so I was prepared to wrangle LR parsers.\u201d<\/p>\n<p>Now, Firestone has a research paper under his belt.\u00a0Misailovic concludes that \u201cWhat is impressive about Preston&#8217;s work is that it started as a side project when solving a practical systems problem with structured LLM tools and evolved into a general statement about many current frameworks for running LLMs. This work brings to attention the need to think systematically about how to improve abstractions when developing new LLM constrained generation frameworks.\u201d<\/p>\n<p>Speaking of the MCS program, he says, \u201cOur Chicago program is growing, and it is bringing together students of diverse backgrounds, some from different disciplines (like Preston, who studied philosophy in the past) and many with professional\/industry experience. \u00a0I met other ambitious and creative students like Preston, who are willing to step outside of their comfort zone and make something new and exciting. The Chicago MCS program is helping those students discover new opportunities and skills &#8212; maybe even those they didn&#8217;t know they were capable of.\u201d<\/p>\n<p>Grainger Engineering Affiliations<\/p>\n<p><a href=\"https:\/\/siebelschool.illinois.edu\/about\/people\/all-faculty\/misailo\" rel=\"nofollow noopener\" target=\"_blank\">Sasa Misailovic<\/a> is an Illinois Grainger Engineering associate professor of computer science.<\/p>\n<p><a href=\"https:\/\/siebelschool.illinois.edu\/about\/people\/all-faculty\/ggnds\" rel=\"nofollow noopener\" target=\"_blank\">Gagandeep Singh<\/a> is an Illinois Grainger Engineering assistant professor of computer science.\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"&#8220;It was impressive that Preston led this work as an\u00a0MCS student with a non-traditional background in computing.\u201d That\u2019s&hellip;\n","protected":false},"author":2,"featured_media":132900,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46],"tags":[191,74],"class_list":{"0":"post-132899","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-computing","9":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/132899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=132899"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/132899\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/132900"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=132899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=132899"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=132899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}