{"id":79803,"date":"2025-08-19T11:38:10","date_gmt":"2025-08-19T11:38:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/79803\/"},"modified":"2025-08-19T11:38:10","modified_gmt":"2025-08-19T11:38:10","slug":"ai-is-unlocking-a-treasure-trove-of-data-held-in-herbarium-collections","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/79803\/","title":{"rendered":"AI is unlocking a treasure trove of data held in herbarium collections"},"content":{"rendered":"<p>In 1770, after <a href=\"https:\/\/www.library.gov.au\/learn\/digital-classroom\/indigenous-responses-cook-and-his-voyage\/james-cooks-endeavour-journal\" rel=\"nofollow noopener\" target=\"_blank\">Captain Cook\u2019s Endeavour struck the Great Barrier Reef<\/a> and was held up for repairs, botanists Joseph Banks and Daniel Solander collected hundreds of plants. <\/p>\n<p><a href=\"https:\/\/bie.ala.org.au\/species\/https:\/\/id.biodiversity.org.au\/node\/apni\/7066434\" rel=\"nofollow noopener\" target=\"_blank\">One of those<\/a> pressed plants is among 170,000 specimens in the herbarium at the University of Melbourne. <\/p>\n<p>Worldwide, more than 395 million specimens are housed in herbaria. Together they comprise an unparalleled record of Earth\u2019s plant and fungal life over time. <\/p>\n<p>We wanted to find a better, faster way to tap into this wealth of information. <a href=\"https:\/\/doi.org\/10.1093\/biosci\/biaf042\" rel=\"nofollow noopener\" target=\"_blank\">Our new research<\/a> describes the development and testing of a new AI-driven tool <a href=\"https:\/\/github.com\/rbturnbull\/hespi\/\" rel=\"nofollow noopener\" target=\"_blank\">Hespi<\/a> (short for \u201cherbarium specimen sheet pipeline\u201d). It has the potential to revolutionise access to biodiversity data and open up new avenues for research. <\/p>\n<p>            <a href=\"https:\/\/images.theconversation.com\/files\/684444\/original\/file-20250807-56-gzkl8d.jpg?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=1000&amp;fit=clip\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" alt=\"A composite image showing a pressed plant specimen collected by Joseph Banks and Daniel Solander in 1770 together with a scale and colour chart, alongside a closeup of the handwritten label\" class=\"lazyload\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/file-20250807-56-gzkl8d.jpg\"  \/><\/a><\/p>\n<p>              The specimen sheet for spreading nut-heads (Epaltes australis), collected by Joseph Banks and Daniel Solander in 1770. (Note, the collection date was historically incorrectly written as 1776 on the specimen label).<br \/>\n              <a class=\"source\" href=\"https:\/\/online.herbarium.unimelb.edu.au\/collectionobject\/MELUD018723a\" rel=\"nofollow noopener\" target=\"_blank\">University of Melbourne Herbarium Collection<\/a><\/p>\n<p>The digitisation challenge<\/p>\n<p>To unlock the full potential of herbaria, institutions worldwide are striving to digitise them. This means photographing each specimen at high resolution and converting the information on its label into searchable digital data.<\/p>\n<p>Once digitised, specimen records can be made available to the public through online databases such as <a href=\"https:\/\/online.herbarium.unimelb.edu.au\/\" rel=\"nofollow noopener\" target=\"_blank\">the University of Melbourne Herbarium Collection Online<\/a>. They are also fed into large biodiversity portals such as the <a href=\"https:\/\/avh.chah.org.au\/\" rel=\"nofollow noopener\" target=\"_blank\">Australasian Virtual Herbarium<\/a>, the <a href=\"https:\/\/www.ala.org.au\/\" rel=\"nofollow noopener\" target=\"_blank\">Atlas of Living Australia<\/a>, or the <a href=\"https:\/\/www.gbif.org\/\" rel=\"nofollow noopener\" target=\"_blank\">Global Biodiversity Information Facility<\/a>. These platforms make centuries of botanical knowledge accessible to researchers everywhere.<\/p>\n<p>But digitisation is a monumental task. Large herbaria, such as the <a href=\"https:\/\/www.botanicgardens.org.au\/our-science\/science-facilities\/national-herbarium-new-south-wales\" rel=\"nofollow noopener\" target=\"_blank\">National Herbarium of New South Wales<\/a> and the <a href=\"https:\/\/www.csiro.au\/en\/news\/all\/articles\/2022\/june\/digitising-the-australian-national-herbarium\" rel=\"nofollow noopener\" target=\"_blank\">Australian National Herbarium<\/a> have used high-capacity conveyor belt systems to rapidly image millions of specimens. Even with this level of automation, <a href=\"https:\/\/www.botanicgardens.org.au\/our-science\/our-collections\/herbarium-collection\/herbarium-digitisation-project\" rel=\"nofollow noopener\" target=\"_blank\">digitising the 1.15 million specimens<\/a> at the National Herbarium of NSW took more than three years.<\/p>\n<p>For smaller institutions without industrial-scale setups, the process is far slower. Staff, volunteers and citizen scientists photograph specimens and painstakingly transcribe their labels by hand.<\/p>\n<p>At the current pace, many collections won\u2019t be fully digitised for decades. This delay keeps vast amounts of biodiversity data locked away. Researchers in ecology, evolution, <a href=\"https:\/\/bioone.org\/journals\/bioscience\/volume-61\/issue-2\/bio.2011.61.2.10\/Climate-Change-and-Biosphere-Response-Unlocking-the-Collections-Vault\/10.1525\/bio.2011.61.2.10.short\" rel=\"nofollow noopener\" target=\"_blank\">climate science<\/a> and <a href=\"https:\/\/www.nature.com\/articles\/s41467-024-51899-1#Sec8\" rel=\"nofollow noopener\" target=\"_blank\">conservation<\/a> urgently need access to large-scale, accurate biodiversity datasets. A faster approach is essential.<\/p>\n<p>            <a href=\"https:\/\/images.theconversation.com\/files\/663951\/original\/file-20250425-62-nph826.jpg?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=1000&amp;fit=clip\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" alt=\"A composite image showing a photo of a yam daisy, image of the specimen in the collection and map showing specimen collection locations across Australia.\" class=\"lazyload\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/file-20250425-62-nph826.jpg\"  \/><\/a><\/p>\n<p>              Map of specimen collection locations for Yam daisy (Microseris lanceolata) from records in The Australasian Virtual Herbarium.<br \/>\n              <a class=\"source\" href=\"https:\/\/vicflora.rbg.vic.gov.au\/flora\/taxon\/86db3654-6285-401a-aaac-ffbdada19569\" rel=\"nofollow noopener\" target=\"_blank\">Neville Walsh, VicFlora<\/a><\/p>\n<p>How AI is speeding things up<\/p>\n<p>To address this challenge, we created <a href=\"https:\/\/github.com\/rbturnbull\/hespi\/\" rel=\"nofollow noopener\" target=\"_blank\">Hespi<\/a> \u2013 open-source software for automatically extracting information from herbarium specimens. <\/p>\n<p>Hespi combines advanced computer vision techniques with AI tools such as object detection, image classification and large language models. <\/p>\n<p>First, it takes an image of the specimen sheet which comprises the pressed plant and identifying text. Then it recognises and extracts text, using a combination of optical character recognition and handwritten text recognition.<\/p>\n<p>Deciphering handwriting is challenging for people and computers alike. So Hespi passes the extracted text through OpenAI\u2019s GPT-4o Large Language Model to correct any errors. This substantially improves the results.<\/p>\n<p>So in seconds, Hespi locates the main specimen label on a herbarium sheet and reads the information it contains. This includes taxonomic names, collector details, location, latitude and longitude, and collection dates. It captures the data and converts it into a digital format, ready for use in research.<\/p>\n<p>For example, Hespi correctly detected and extracted all relevant components from the herbarium sheet below. This large brown algae <a href=\"https:\/\/online.herbarium.unimelb.edu.au\/collectionobject\/MELUA002557a\" rel=\"nofollow noopener\" target=\"_blank\">specimen<\/a> was collected in 1883 at St Kilda. <\/p>\n<p>            <a href=\"https:\/\/images.theconversation.com\/files\/684852\/original\/file-20250811-56-xh37hd.jpg?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=1000&amp;fit=clip\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" alt=\"An image showing how Hespi reads the plant specimen sheet and tags information such as the genus, species, locality and year of collection.\" class=\"lazyload\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/file-20250811-56-xh37hd.jpg\"  \/><\/a><\/p>\n<p>              Results from Hespi on a sample of large brown algae (MELUA002557a) from the University of Melbourne, identifying important details such as the genus, species, locality and year of collection.<br \/>\n              <a class=\"source\" href=\"https:\/\/online.herbarium.unimelb.edu.au\/collectionobject\/MELUA002557a\" rel=\"nofollow noopener\" target=\"_blank\">University of Melbourne Herbarium<\/a><\/p>\n<p>We tested Hespi on thousands of specimen images from the University of Melbourne Herbarium and other collections worldwide. We created test datasets for different stages in the pipeline and assessed the various components.<\/p>\n<p>It achieved a <a href=\"https:\/\/doi.org\/10.1093\/biosci\/biaf042\" rel=\"nofollow noopener\" target=\"_blank\">high degree of accuracy<\/a>. So it has the potential to save a lot of time, compared to manual data extraction.<\/p>\n<p>We are developing a graphical user interface for the software so herbarium curators will be able to manually check and correct the results.<\/p>\n<p>Just the beginning<\/p>\n<p>Herbaria already <a href=\"https:\/\/repository.si.edu\/items\/6f3ebe79-e4fe-4f3e-aa7a-6f51753a3aa8\" rel=\"nofollow noopener\" target=\"_blank\">contribute to society in many ways<\/a>: from species identification and taxonomy to ecological monitoring, conservation, education, and even forensic investigations.<\/p>\n<p>By mobilising large volumes of specimen-associated data, AI systems such as Hespi are enabling <a href=\"https:\/\/www.cell.com\/trends\/ecology-evolution\/fulltext\/S0169-5347(22)00295-6\" rel=\"nofollow noopener\" target=\"_blank\">new and innovative applications<\/a> at a scale never before possible.<\/p>\n<p>AI has been used to automatically extract detailed <a href=\"https:\/\/bsapubs.onlinelibrary.wiley.com\/doi\/10.1002\/aps3.11367\" rel=\"nofollow noopener\" target=\"_blank\">leaf measurements and other traits<\/a> from digitised specimens, unlocking centuries of historical collections for rapid research into plant evolution and ecology.<\/p>\n<p>And this is just the beginning \u2014 computer vision and AI could soon be applied in many other ways, further accelerating and expanding botanical research <a href=\"https:\/\/nph.onlinelibrary.wiley.com\/doi\/10.1111\/nph.70312\" rel=\"nofollow noopener\" target=\"_blank\">in the years ahead<\/a>.<\/p>\n<p>            <a href=\"https:\/\/images.theconversation.com\/files\/685902\/original\/file-20250818-56-ml84v8.jpg?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=1000&amp;fit=clip\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" alt=\"Photo of a well-lit pressed plant specimen sheet on black table with camera mounted above, looking down.\" class=\"lazyload\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/file-20250818-56-ml84v8.jpg\"  \/><\/a><\/p>\n<p>              The digitisation pipeline at the University of Melbourne Herbarium begins with the generation of a high-resolution specimen image.<br \/>\n              University of Melbourne Herbarium<\/p>\n<p>Beyond herbaria<\/p>\n<p>AI pipelines such as Hespi have the potential to extract text from labels in any museum or archival collection where high-quality digital images exist. <\/p>\n<p>Our next step is a collaboration with Museums Victoria to adapt Hespi to create an AI digitisation pipeline suitable for museum collections. The AI pipeline will mobilise biodiversity data for about 12,500 specimens in the museum\u2019s globally-significant fossil graptolite collection. <\/p>\n<p>            <a href=\"https:\/\/images.theconversation.com\/files\/685919\/original\/file-20250818-66-rcponp.jpg?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=1000&amp;fit=clip\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" alt=\"An image showing a dark grey fossil graptolite specimen with numbers attached alongside handwritten labels with annotations from Hespi.\" class=\"lazyload\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/file-20250818-66-rcponp.jpg\"  \/><\/a><\/p>\n<p>              A fossil graptolite specimen from Museums Victoria annotated by Hespi during data digitsition.<br \/>\n              Museums Victoria<\/p>\n<p>We are also starting a new project with the <a href=\"https:\/\/ardc.edu.au\/\" rel=\"nofollow noopener\" target=\"_blank\">Australian Research Data Commons (ARDC)<\/a> to make the software more flexible. This will allow curators in museums and other institutions to customise Hespi to extract data from all kinds of collections \u2014 not just plant specimens.<\/p>\n<p>Tranformational technology<\/p>\n<p>Just as AI is reshaping many aspects of daily life, these technologies can transform access to biodiversity data. <a href=\"https:\/\/academic.oup.com\/bioscience\/article\/75\/6\/457\/8099133\" rel=\"nofollow noopener\" target=\"_blank\">Human-AI collaborations<\/a> could help overcome one of the biggest bottlenecks in collection digitisation \u2014 the slow, manual transcription of label data. <\/p>\n<p>Mobilising the information already locked in herbaria, museums, and archives worldwide is essential to make it available for the cross-disciplinary research needed to understand and address the biodiversity crisis.<\/p>\n<p>We wish to acknowledge our colleagues at the <a href=\"https:\/\/www.unimelb.edu.au\/mdap\" rel=\"nofollow noopener\" target=\"_blank\">Melbourne Data Analytics Platform<\/a>, including <a href=\"https:\/\/findanexpert.unimelb.edu.au\/profile\/866064-karen-thompson\" rel=\"nofollow noopener\" target=\"_blank\">Karen Thompson<\/a> and <a href=\"https:\/\/findanexpert.unimelb.edu.au\/profile\/196181-emily-fitzgerald\" rel=\"nofollow noopener\" target=\"_blank\">Emily Fitzgerald<\/a>, who contributed to this research.<\/p>\n","protected":false},"excerpt":{"rendered":"In 1770, after Captain Cook\u2019s Endeavour struck the Great Barrier Reef and was held up for repairs, botanists&hellip;\n","protected":false},"author":2,"featured_media":79804,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-79803","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/79803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=79803"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/79803\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/79804"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=79803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=79803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=79803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}