{"id":29444,"date":"2025-07-22T19:50:14","date_gmt":"2025-07-22T19:50:14","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/29444\/"},"modified":"2025-07-22T19:50:14","modified_gmt":"2025-07-22T19:50:14","slug":"policy-agendas-of-the-american-state-legislatures","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/29444\/","title":{"rendered":"Policy agendas of the American state legislatures"},"content":{"rendered":"<p>There are challenges to validating these data, as there is no \u201cground truth\u201d dataset of state legislative bills coded under the Gray and Lowery codebook. This presents two issues: first, translating between disparate codebooks introduces measurement error, as bills could be \u201ccorrectly\u201d assigned according to each codebook but not translate to each other. For example, the Gray and Lowery codebook has a category for electric utilities (\u201cUtilities\u201d) that is separate from the PPDP\u2019s \u201cEnergy\u201d topic, so any bill assigned to utilities could be labeled as incorrect. Second, the complexity of assigning bills to a single policy area is a demanding task on its own. Many state politics researchers have deemed the classification task complex enough to necessitate that a domain expert inspect documents by-hand. For example, Reingold et al. (2021)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\" title=\"Reingold, B., Kreitzer, R. J., Osborn, T. &amp; Swers, M. L. Anti-abortion policymaking and women&#x2019;s representation. Political Research Quarterly 1065912920903381 Publisher: SAGE Publications Sage CA: Los Angeles, CA (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR25\" id=\"ref-link-section-d26238300e2426\" rel=\"nofollow noopener\" target=\"_blank\">25<\/a> first applies a dictionary method to identify legislation concerning abortion by searching for the \u201cabortion\u201d keyword, and then their second stage of analysis requires hand-coding bills to measure their abortion \u201csentiment.\u201d The Congressional Bills Project<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 1\" title=\"Adler, E. S. &amp; Wilkerson, J. Congressional bills project: 1998-2014. NSF 00880066 and 00880061 1 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR1\" id=\"ref-link-section-d26238300e2430\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a> and the codebook to which it adheres do away with keywords entirely, instead training hand-coders for several weeks to identify the leading policy area of the bill. This approach has its own issues, first it is not 100 percent reliable. Hillard et al. (2008, p. 40)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Hillard, D., Purpura, S. &amp; Wilkerson, J. Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology &amp; Politics 4, 31&#x2013;46 (2008).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR15\" id=\"ref-link-section-d26238300e2437\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a> report that coders are trained until they code congressional bills by 21 major topic codes at 90 percent reliability, and the 200+ minor topic codes at 80 percent reliability. Second, using hand-coders comes at the expense of transparency; unless the project clearly annotates the \u201cwhy\u201d for each hand-coder classification decision, it is difficult to diagnose hand-coder versus downstream researcher disagreements.<\/p>\n<p>To address these concerns we take the following steps to demonstrate the validity of these estimates. First, we compare the machine learning model\u2019s estimates to the legacy dictionary method. This shows how the machine learning model offers similar levels of precision, which provides confidence that the model is finding the policy content it is claiming to, but far better recall, or the degree the model is finding all possible bills in each policy area. Second, we compare the model to estimates generated by hand-coders<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"McLaughlin, J. P. et al. The Pennsylvania Policy Database Project: A model for comparative analysis. State Politics and Policy Quarterly 10, 320&#x2013;336 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR16\" id=\"ref-link-section-d26238300e2444\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a> to show that the model is reliable.<\/p>\n<p>These exercises demonstrate the external validity of these estimates such that machine learning estimates track the hand-coded estimates of the Pennsylvania legislature closely over time. In particular, the machine learning greatly outperforms the legacy dictionary method in its recall of potential bill codings.<\/p>\n<p>Model Evaluation Against Legacy Dictionary Method<\/p>\n<p>We first evaluate the machine learning estimates against the legacy dictionary method. This is not a traditional validation, as the dictionary method is less of a \u201cground truth\u201d than an alternative approach. But by showing which policy areas these two models agree and disagree on, we can better understand how the machine learning model operates. In Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab3\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>, a \u201cfalse positive\u201d denotes an instance where the model predicts the presence of a given topic despite the absence of all its keywords. A \u201cfalse negative\u201d denotes an instance where the model predicts the absence of a given topic despite the presence of one of its keywords. \u201cTrue positives\u201d and \u201ctrue negatives\u201d are cases where the model and dictionary method agree to assign or not assign the topic to the bill, respectively.<\/p>\n<p>Table 3 Model performance evaluated against the legacy dictionary method.<\/p>\n<p>For almost all topics, false positives outnumber true positives, owing to the fact that the legacy dictionary method generates codes for a much smaller share of bills. A higher topic Precision denotes a larger share of true positives compared against all positives; as Precision increases, the model finds fewer instances where it predicts the presence of a topic despite the absence of keywords, meaning the keywords are more \u201cnecessary\u201d for identifying the topic. For example, the most precisely-defined topic, by far, is \u201cTax Policy\u201d, with Precision\u00a0=\u00a074%. It may be hard-pressed to think of a bill pertaining to \u201cTax Policy\u201d which lacks all mentions of \u201ctax,\u201d \u201ctaxation,\u201d \u201ctaxable,\u201d and so-on (or at least, it may be harder to find such instances than it would be to find counterparts in other topics), but there are \u201cTax Policy\u201d bills that include mentions of a \u201clevy,\u201d \u201cDepartment of Revenue,\u201d \u201ceconomic development,\u201d or \u201cbonds.\u201d<\/p>\n<p>The lowest precision topics are \u201cInternational Affairs and Foreign Aid\u201d (5%), \u201cLaw\u201d (6%), and \u201cCivil Rights\u201d (13%), meaning the model frequently predicts the topics\u2019 presence without keywords being present. To shed light on whether these overrides of the dictionary method\u2019s keyword rules are valuable, \u201cInternational Affairs and Foreign Aid\u201d (abbreviated in the table as \u201cIntl. Affairs\u201d) often include \u201cexchange programs\u201d (for students), explicitly mention other countries, or include other nouns plausibly associated with international affairs. For example, the dictionary defined in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a> includes the keywords \u201cdiplomat\u201d and \u201cembassy,\u201d but not \u201cambassador,\u201d and yet, \u201cambassador\u201d takes on a similar semantic meaning with respect to the machine learning model\u2019s classification task. Using the keywords associated with the \u201cCivil Rights\u201d topic as its conceptual \u201cseeds,\u201d the model has grown the topic to include a large number of resolutions, especially those which \u201ccelebrate the life\u201d of an individual, mention \u201cslave trade,\u201d \u201cequal opportunity,\u201d or \u201chuman rights.\u201d The multi-label nature of the data proves especially useful in flagging e.g. a bill which mentions \u201cFemale Veteran\u2019s Day\u201d as pertaining to both \u201cCivil Rights\u201d and \u201cMilitary,\u201d and a bill mentioning the \u201cEqual Opportunity Scholarship Act\u201d as pertaining to both \u201cCivil Rights\u201d and \u201cEducation.\u201d False positives for \u201cLaw\u201d often discuss \u201cliability,\u201d \u201cunenforceable,\u201d \u201ccovenant,\u201d \u201ccontract,\u201d \u201cJuvenile Justice,\u201d and \u201callegations.\u201d<\/p>\n<p>To provide a high-level perspective the internal consistency of the model, Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6<\/a> presents a heatmap of topic co-occurrences for bills for which no keywords are present. Using non-keyword-coded bills illustrates that how the model performs without the strongest clues regarding policy areas. For each of the figures, a cell value denotes the percentage of bills coded as Topic\u00a0=\u00a0Row which were jointly coded as Topic\u00a0=\u00a0Column. For example, the \u201cCivil Rights\u201d topic most frequently co-occurs with \u201cReligion,\u201d \u201cLaw,\u201d and \u201cLabor and Employment,\u201d even when of whether the bills contain zero keywords. The other top co-occurrences are \u201cEnvironment\u201d and \u201cPublic Lands and Water Management\u201d and \u201cSports\u201d (often with references to fishing and hunting), \u201cHealth\u201d and \u201cInsurance,\u201d \u201cForeign Trade\u201d and \u201cManufacturing,\u201d \u201cConstruction\u201d and \u201cUtilities,\u201d as well as \u201cNatural Resource\u201d and&#8221;Environment.\u201d The \u201cUtilities\u201d topic most often co-occurs with \u201cCommunication\u201d (e.g. telecomms) and \u201cLocal Government.\u201d<\/p>\n<p>Fig. 6<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05621-5\/figures\/6\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig6\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/07\/41597_2025_5621_Fig6_HTML.png\" alt=\"figure 6\" loading=\"lazy\" width=\"685\" height=\"442\"\/><\/a><\/p>\n<p>Correlation between topics based on model predictions, no keywords. Notes: Cells denote the frequency with which bills predicted as pertaining to Topic\u00a0=\u00a0Row are also coded as Topic\u00a0=\u00a0Column by the model, observing only bills for which there are no keywords present from either Topic\u00a0=\u00a0Row or Topic\u00a0=\u00a0Column. For example, 12.8% of bills which were predicted as pertaining to \u201cReligion\u201d (while lacking any \u201cReligion\u201d keywords) were also coded as pertaining to \u201cCivil Rights\u201d (while lacking any \u201cCivil Rights\u201d keywords).<\/p>\n<p>Internal Validation<\/p>\n<p>To demonstrate the face validity of the data, we drill down into the estimates with an issue that has emerged in the years since the original dictionary codebooks were published: fracking. Fracking developed as a political issue in the US following the development of \u201chydraulic fracturing\u201d or \u201chydrofracking\u201d technology in the mid-2000s allowed for energy companies to take advantage of shale underneath many, but not all, American states to produce natural gas and oil. While this was a revolutionary technique for energy extraction, it also had concerning environmental consequences, as the chemicals used for fracking, and its byproducts, could be toxic for groundwater and other environmental factors.<\/p>\n<p>To test for fracking being accounted for by our model, we look to see if bills containing the many synonyms for this new technology are coded in the categories we would expect. We identify fracking bills using the following (case-insensitive) search terms in bill titles and descriptions: fracking, hydro fracturing, hydro-fracturing, hydrofracturing, hydro fracking, hydro-fracking, hydrofracking, hydraulic fracturing, hydraulic-fracturing, shale gas, shale oil, horizontal drill, horizontal gas, horizontal stimulation, horizontal well, fracturing fluid, fracturing wastewater, fracturing water, and fracturing chemical. At least one of these search terms appeared in 736 bills from 2009-2020. We find that the model assigns them in logical categories. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Fig7\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a> illustrates that the fracking-related search terms are mostly concentrated to the \u201cEnergy or Natural Resource\u201d and \u201cEnvironment\u201d topics, there is also some prevalence in \u201cPublic Lands and Water Management\u201d. We take this exercise as further evidence\u2019s of the model\u2019s internal reliability.<\/p>\n<p>Fig. 7<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05621-5\/figures\/7\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig7\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/07\/41597_2025_5621_Fig7_HTML.png\" alt=\"figure 7\" loading=\"lazy\" width=\"685\" height=\"479\"\/><\/a><\/p>\n<p>Topic hit rate for example search terms related to \u201cFracking.\u201d Notes: A cell denotes the percentage of bills predicted to be assigned to topic\u00a0=\u00a0row that include the (case-insensitive) search term associated with its column.<\/p>\n<p>A human coder also evaluated a sample of 1,000 randomly selected bills drawn from a separate database (OpenStates) to 1) evaluate the completeness of the sample drawn from Legiscan, 2) allow for the comparison of a multi-class system, 3) provide feedback on the uncertainty of the process. This exercise produced a number of results. First, it showed that Legiscan contained 100 percent of the bills in the OpenStates archives, which reaffirms our decision to take the Legiscan data at face value. Second, the human coder revealed the difficulty of these coding decisions. The human coder was provided the same bill title from legiscan that the model was fed, and was asked if this was sufficient to assign a bill or if it was necessary to look up the full text of the bill via a link to its pdf. On 25.4% of bills, they required the full text. This was not data that the model had access to. Third, they were instructed to place the bill into a single category, if feasible, or if the bill was too complex, to assign a second category. They found that 44% of bills required a second category.<\/p>\n<p>We evaluate the accuracy of the model in two ways designed to evaluate multi-label prediction models<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Erlich, A., Dantas, S. G., Bagozzi, B. E., Berliner, D. &amp; Palmer-Rubin, B. Multi-label prediction for political text-as-data. Political Analysis 30, 463&#x2013;480 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR26\" id=\"ref-link-section-d26238300e4252\" rel=\"nofollow noopener\" target=\"_blank\">26<\/a>. First, we estimate the model\u2019s \u201cRanking Loss\u201d, which calculates the average number of incorrectly labeled pairs. For example, the hand-coder assigned Alabama\u2019s \u201cProperty Insurance and Energy Reduction Act\u201d (SB 220 from 2015, titled: \u201cEnergy efficiency projects, financing by local governments authorized, non ad valorem tax assessments, liens, bonds authorized, Property Insurance and Energy Reduction Act of Alabama\u201d) as \u201cNatural Resources\u201d and \u201cLocal Government\u201d, but the model\u2019s first two estimates were \u201cUtilities\u201d (\u03c4\u00a0=\u00a00.98) and \u201cTax Policies\u201d (\u03c4\u00a0=\u00a00.42). Since \u201cLocal Government\u201d (\u03c4\u00a0=\u00a00.16) was the third highest estimate, the ranking loss for this bill was 2. A perfect classifier would be zero, and chance would be 0.5. The average ranking loss for this sample of bills was 0.043, an impressive score.<\/p>\n<p>Another way to conceive of multi-label classifier is through Top-K agreement, which indicates the share of times the hand-coder assigns the bill to a variable (K) number of classes. The previous example would be a bill that does not find agreement at K=1 or K=2, but does find agreement at K=3. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab4\" rel=\"nofollow noopener\" target=\"_blank\">4<\/a> shows how if only the top option is included, there is agreement between the human coder and model estimates on 57 percent of bills. If the first three model estimates are considered there is agreement on 80 percent of bills. This is similar to the Policy Agenda Project\u2019s considers to be suitable for human coder performance on minor topic codes.<\/p>\n<p>Table 4 Top-K agreement of model estimates and human coded sample (n=1,000).<\/p>\n<p>The accuracy and precision of the model can also be calibrated using different tau cutoff rates. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab5\" rel=\"nofollow noopener\" target=\"_blank\">5<\/a> shows the range, and the default level (\u03c4\u00a0=\u00a00.5). A higher tau would only provide more confident estimates, at the expense of total coverage. The lower F1 scores can be considered an artifact of the multi-label to multi-label comparison.<\/p>\n<p>Table 5 Evaluation of machine learning estimates (at different levels of \u03c4 against human coded sample (n=1,000).External Validation<\/p>\n<p>The PPDP provides an opportunity for an external validation of our estimates. The PPDP was handcoded by researchers using a version of the Comparative Policy Agendas codebook, adjusted for the context of state politics. The PPDP coded the universe of Pennsylvania legislative bills from 1979-2016, and the manual nature of this process has been considered the gold standard in the field before the implementation of automated methods. However, there are two barriers to comparing these two sets of estimates of the same bills before the Pennsylvania legislature. First, as shown in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab6\" rel=\"nofollow noopener\" target=\"_blank\">6<\/a>, there is misalignment on some topics. The Comparative Policy Agendas codebook was designed to account for the issues dealt with by national leaders of western democracies, including topics like \u201cMacroeconomics\u201d or \u201cInternational Affairs and Foreign Aid\u201d that are not relevant for a codebook optimized to study American state politics<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Fellowes, M., Gray, V. &amp; Lowery, D. What&#x2019;s on the table? The content of state policy agendas. Party Politics 12, 35&#x2013;55 (2006).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR10\" id=\"ref-link-section-d26238300e4746\" rel=\"nofollow noopener\" target=\"_blank\">10<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Gray, V. &amp; Lowery, D.The population ecology of interest representation: Lobbying communities in the American states (University of Michigan Press, 1996).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR27\" id=\"ref-link-section-d26238300e4749\" rel=\"nofollow noopener\" target=\"_blank\">27<\/a>. Therefore, we use 17 policy areas that overlap across the two datasets: civil rights, health, agriculture, labor, education, environment, energy, transportation, legal, welfare, construction, military, communications, public lands, local government. There are some difficult decisions to be made aligning these coding schemes, and we err on the side of caution by not assuming that the PPDP\u2019s \u201cDomestic Commerce\u201d code would include several different codes that could fit there including: \u201cinsurance\u201d, \u201cmanufacturing\u201d, \u201cbank\u201d, or \u201csmall business.\u201d<\/p>\n<p>Table 6 Alignment between PPDP and machine learning codebook.<\/p>\n<p>The second barrier to comparison is that the PPDP puts bills into only one major topic area and our model estimates a number of policy labels for each bill. This puts a ceiling on the potential precision of our measure. For example, our model estimates that 2009\u2019s House Bill 890 \u201cEstablishing a nursing and nursing educator loan forgiveness and scholarship program\u201d is coded as \u201cHealth\u201d and \u201cEducation,\u201d while the PPDP only considers it to be an \u201ceducation\u201d bill. Therefore, it is a \u201cfalse positive,\u201d while we consider this bill aimed at reducing the nursing shortage to bill a health care bill. Interestingly, if we only use a dictionary model based on the presence of keywords, that bill is also only labeled as \u201cEducation\u201d. As above, we will compare both the keyword-only estimates as well as the modeled estimates in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab7\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a>.<\/p>\n<p>Table 7 External validation of modeled estimates using the Penn. Policy Database Project: 2007-2016.<\/p>\n<p>With those caveats in mind, Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Fig8\" rel=\"nofollow noopener\" target=\"_blank\">8<\/a> shows the machine learning estimates more closely approximate the PPDP coding of the state\u2019s legislative agenda. A \u201cperfect\u201d relationship will align with the dark black line that is a 1:1 relationship. To get into the specifics, Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab7\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a> shows how models compare. The micro-averaged F1 score is 0.50, which is a major improvement upon the legacy dictionary\u2019s micro-averaged F1 score of 0.29. This difference is driven by a dramatic increase in recall, as the machine learning estimate is retrieving about 57 percent of the PPDP bill codes, while the legacy keyword model only recovers a quarter of those estimates. This recall distinction has practical use for researchers. The relatively poor recall of the dictionary model led to the original recommendation to only use those estimates to only make over-time comparisons within a state\/policy area, which minimizes the blindspots of the keyword approach<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\" title=\"Garlick, A. Laboratories of politics: There is bottom-up diffusion of policy attention in the American federal system. Political Research Quarterly 76, 29&#x2013;43 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR2\" id=\"ref-link-section-d26238300e6336\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a>. However, the results in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Fig8\" rel=\"nofollow noopener\" target=\"_blank\">8<\/a> allow researchers to make claims across the agenda such as \u201cthere are more bills introduced about transportation in Pennsylvania over the period 2009-2016 than energy,\u201d as researchers can have confidence the relative size of the policy agendas are being evaluated.<\/p>\n<p>Fig. 8<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05621-5\/figures\/8\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig8\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/07\/41597_2025_5621_Fig8_HTML.png\" alt=\"figure 8\" loading=\"lazy\" width=\"685\" height=\"412\"\/><\/a><\/p>\n<p>There is a stronger correlation between Machine Learning estimates and the Pennsylvania Policy Database Project than the legacy keyword model. Note: Estimates are labeled with the major topic number (see column 1 of Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab7\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a> and the last two digits of the year.<\/p>\n<p>In terms of reliability, methods designed to evaluate multi-label classification schemes<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Erlich, A., Dantas, S. G., Bagozzi, B. E., Berliner, D. &amp; Palmer-Rubin, B. Multi-label prediction for political text-as-data. Political Analysis 30, 463&#x2013;480 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#ref-CR26\" id=\"ref-link-section-d26238300e6367\" rel=\"nofollow noopener\" target=\"_blank\">26<\/a> show the machine learning estimates perform well. The average ranking loss is 0.11 (0.00 would be perfect, 0.50 would be near chance), which is slightly more ranking loss than the human coder exercise, but that is to be expected because of the measurement error aligning the codebooks. The top-K agreement between the model estimates and PPDP in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05621-5#Tab8\" rel=\"nofollow noopener\" target=\"_blank\">8<\/a> shows at K\u00a0=\u00a03 there is 70 percent agreement. So by considering three guesses per bill, the machine learning method produces a reasonable facsimile of what a human coder can be expected to produce. There also appears to be diminishing returns to the Top-K agreement near 90 percent. This reveals just how difficult this task may be, that even with many guesses it may not be reasonable to generate complete agreement.<\/p>\n<p>Table 8 Top-K agreement of model estimates and Pennsylvania codes (n=1,000).<\/p>\n","protected":false},"excerpt":{"rendered":"There are challenges to validating these data, as there is no \u201cground truth\u201d dataset of state legislative bills&hellip;\n","protected":false},"author":2,"featured_media":29445,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[59],"tags":[2420,97,252,253,473,1159,9746,1160,111,79],"class_list":{"0":"post-29444","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-health-care","8":"tag-government","9":"tag-health","10":"tag-health-care","11":"tag-healthcare","12":"tag-history","13":"tag-humanities-and-social-sciences","14":"tag-law","15":"tag-multidisciplinary","16":"tag-politics","17":"tag-science"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/29444","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=29444"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/29444\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/29445"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=29444"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=29444"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=29444"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}