{"id":108871,"date":"2025-10-30T13:05:11","date_gmt":"2025-10-30T13:05:11","guid":{"rendered":"https:\/\/www.newsbeep.com\/nz\/108871\/"},"modified":"2025-10-30T13:05:11","modified_gmt":"2025-10-30T13:05:11","slug":"environmental-noise-dataset-for-sound-event-classification-and-detection","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/nz\/108871\/","title":{"rendered":"Environmental Noise Dataset for Sound Event Classification and Detection"},"content":{"rendered":"<p>The automation of acoustic analysis represents a rapidly evolving interdisciplinary domain of research, combining acoustics, signal processing, and machine learning. Two particularly important tasks in audio analysis are Sound Event Classification (SEC), which involves the classification and assignment of categorical labels to audio segments with the aim of identifying sound sources, and Sound Event Detection (SED), which determines the occurrence of sound events within the time component of a recording and includes the precise timestamp of an event\u2019s onset and offset. The two aforementioned tasks address the two-fold problem of classifying acoustic information and specifying the respective temporal characteristics across various auditory scenes. Sound Event Detection (SED) has gained significant research attention over the past several years due to its widespread applications. It is central to many applications including urban acoustic planning<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 1\" title=\"Bello, J. P. et al. SONYC: A system for monitoring, analyzing, and mitigating urban noise pollution. Commun. ACM. 62(2), 68&#x2013;77 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR1\" id=\"ref-link-section-d142610135e436\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a> healthcare<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Drugman, T. et al. Audio and contact microphones for cough detection. arXiv preprint (2020).\" href=\"#ref-CR2\" id=\"ref-link-section-d142610135e440\">2<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"H&#xFC;wel, A., Adilo&#x11F;lu, K. &amp; Bach, J. H. Hearing aid research data set for acoustic environment recognition. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 706&#x2013;710 (2020).\" href=\"#ref-CR3\" id=\"ref-link-section-d142610135e440_1\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Messner, E. et al. Multi-channel lung sound classification with convolutional recurrent neural networks. Comput. Biol. Med. 122, 103831 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR4\" id=\"ref-link-section-d142610135e443\" rel=\"nofollow noopener\" target=\"_blank\">4<\/a>, bioacoustics monitoring<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Cramer, J., Lostanlen, V., Farnsworth, A., Salamon, J. &amp; Bello, J. P. Chirping up the right tree: Incorporating biological taxonomies into deep bioacoustic classifiers. Proc. IEEE Int. Conf. Acoust., Speech Signal Process. 901&#x2013;905 (2020).\" href=\"#ref-CR5\" id=\"ref-link-section-d142610135e447\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Lostanlen, V., Salamon, J., Farnsworth, A., Kelling, S. &amp; Bello, J. P. Robust sound event detection in bioacoustic sensor networks. PLoS One. 14(10), e0214168 (2019).\" href=\"#ref-CR6\" id=\"ref-link-section-d142610135e447_1\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Xu, K., Cai, H., Liu, X., Gao, Z. &amp; Zhang, B. North Atlantic Right Whale call detection with very deep convolutional neural networks. J. Acoust. Soc. Amer. 141(5), 3944&#x2013;3945 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR7\" id=\"ref-link-section-d142610135e450\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a>, surveillance in security<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"Crocco, M., Cristani, M., Trucco, A. &amp; Murino, V. Audio surveillance: A systematic review. ACM Comput. Surv. 48(4), 1&#x2013;46 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR8\" id=\"ref-link-section-d142610135e454\" rel=\"nofollow noopener\" target=\"_blank\">8<\/a>, multimedia event detection<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Wang, Y., Neves, L. &amp; Metze, F. Audio-based multimedia event detection using deep recurrent neural networks. Proc. IEEE Int. Conf. Acoust., Speech Signal Process. 2742&#x2013;2746 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR9\" id=\"ref-link-section-d142610135e458\" rel=\"nofollow noopener\" target=\"_blank\">9<\/a>, event analysis in a large scale<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Jansen, A. et al. Large-scale audio event discovery in one million YouTube videos. Proc. IEEE Int. Conf. Acoust., Speech Signal Process. 786&#x2013;790 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR10\" id=\"ref-link-section-d142610135e463\" rel=\"nofollow noopener\" target=\"_blank\">10<\/a>, industrial noise monitoring<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\" title=\"Morrison, M. &amp; Pardo, B. OtoMechanic: Auditory automobile diagnostics via query-by-example. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/arXiv.1911.02073&#010;                  &#010;                 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR11\" id=\"ref-link-section-d142610135e467\" rel=\"nofollow noopener\" target=\"_blank\">11<\/a>, smart-home technology<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 12\" title=\"Shabbir, A. et al. Enhancing smart home environments: a novel pattern recognition approach to ambient acoustic event detection and localization. Frontiers in Big Data 7, 1419562 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR12\" id=\"ref-link-section-d142610135e471\" rel=\"nofollow noopener\" target=\"_blank\">12<\/a> and wildlife monitoring<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 13\" title=\"Kath, H., Serafini, P. P., Campos, I. B., Gouv&#xEA;a, T. S. &amp; Sonntag, D. Leveraging transfer learning and active learning for sound event detection in passive acoustic monitoring of wildlife. 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE-2024), (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR13\" id=\"ref-link-section-d142610135e475\" rel=\"nofollow noopener\" target=\"_blank\">13<\/a>.<\/p>\n<p>Although deep learning has transformed image recognition and many other fields, progress in recognising everyday sounds has not been as rapid, partly because there are no standardised, even large, audio datasets<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Piczak, K. J. ESC: Dataset for Environmental Sound Classification. Proceedings of the 23rd ACM International Conference on Multimedia (2015).\" href=\"#ref-CR14\" id=\"ref-link-section-d142610135e482\">14<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Stowell, D., Wood, M., Pamu&#x142;a, H., Stylianou, Y. &amp; Glotin, H. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods in Ecology and Evolution. 10(3), 368&#x2013;380 (2019).\" href=\"#ref-CR15\" id=\"ref-link-section-d142610135e482_1\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Mkrtchian, G. &amp; Furletov, Y. Classification of environmental sounds using neural networks. 2022 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO), 1&#x2013;4 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR16\" id=\"ref-link-section-d142610135e485\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a>. In fact, there are so many applications that new neural networks for SEC and SED are being developed every day, and older ones need to be updated. Both processes continuously require detailed and tuned datasets, which then become extremely valuable as they should be as specific as possible. However, the currently available datasets do not take into account the complexity of realistic environments, which include multiple sounds or mixtures of sounds with a constantly present background noise<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Fonseca, E., Favory, X., Pons, J., Duchateau, J. &amp; Serra, X. FSD50K: An Open Dataset of Human-Labeled Sound Events. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 30, 829&#x2013;852 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR17\" id=\"ref-link-section-d142610135e489\" rel=\"nofollow noopener\" target=\"_blank\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Cakir, E., Heittola, T., Huttunen, H. &amp; Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 25(6), 1291&#x2013;1303 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR18\" id=\"ref-link-section-d142610135e492\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>. This is particularly true for outdoor applications, where the lack of high quality, large and diverse datasets is recognised as a significant challenge for SED<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Liang, J. et al. Mind the Domain Gap: A Systematic Analysis on Bioacoustic Sound Event Detection. Proc. Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR19\" id=\"ref-link-section-d142610135e496\" rel=\"nofollow noopener\" target=\"_blank\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 20\" title=\"Mesaros, A., Heittola, T., Virtanen, T. &amp; Plumbley, M. D. Sound Event Detection: A Tutorial. IEEE Signal Processing Magazine, 38(5),\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR20\" id=\"ref-link-section-d142610135e499\" rel=\"nofollow noopener\" target=\"_blank\">20<\/a>.<\/p>\n<p>In recent years, several datasets have been developed to support sound event classification (SEC) and sound event detection (SED) tasks, including UrbanSound<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Salamon, J., Jacoby, C. &amp; Bello, J. P. A Dataset and Taxonomy for Urban Sound Research. Proceedings of the 22nd ACM International Conference on Multimedia (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR21\" id=\"ref-link-section-d142610135e506\" rel=\"nofollow noopener\" target=\"_blank\">21<\/a> and UrbanSound8K<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Salamon, J., Jacoby, C. &amp; Bello, J. P. A Dataset and Taxonomy for Urban Sound Research. Proceedings of the 22nd ACM International Conference on Multimedia (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR21\" id=\"ref-link-section-d142610135e510\" rel=\"nofollow noopener\" target=\"_blank\">21<\/a>, ESC-50<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Piczak, K. J. ESC: Dataset for Environmental Sound Classification. Proceedings of the 23rd ACM International Conference on Multimedia (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR14\" id=\"ref-link-section-d142610135e514\" rel=\"nofollow noopener\" target=\"_blank\">14<\/a>, AudioSet<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Gemmeke, J. F. et al. Audio Set: An Ontology and Human-Labeled Dataset for Audio Events. Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP) (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR22\" id=\"ref-link-section-d142610135e518\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>, and FSD50K<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Fonseca, E., Favory, X., Pons, J., Duchateau, J. &amp; Serra, X. FSD50K: An Open Dataset of Human-Labeled Sound Events. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 30, 829&#x2013;852 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR17\" id=\"ref-link-section-d142610135e522\" rel=\"nofollow noopener\" target=\"_blank\">17<\/a>. Many of these datasets were partially constructed using audio clips from collaborative platforms such as Freesound.org<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Font, F., Roma, G. &amp; Serra, X. Freesound technical demo. In Proceedings of the 21st ACM International Conference on Multimedia, 21, 411&#x2013;412, &#010;                  https:\/\/doi.org\/10.1145\/2502081.2502245&#010;                  &#010;                 (Association for Computing Machinery, New York, NY, USA, 2013).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR23\" id=\"ref-link-section-d142610135e527\" rel=\"nofollow noopener\" target=\"_blank\">23<\/a>, which has become a widely used resource for real-world soundscape data. These datasets have significantly contributed to the advancement of the field and are commonly used for benchmarking and training. However, they also present limitations that can reduce their applicability to real-world environmental noise analysis. These include a limited number of classes<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Piczak, K. J. ESC: Dataset for Environmental Sound Classification. Proceedings of the 23rd ACM International Conference on Multimedia (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR14\" id=\"ref-link-section-d142610135e531\" rel=\"nofollow noopener\" target=\"_blank\">14<\/a>, the use of synthetic or pre-processed content<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Cakir, E., Heittola, T., Huttunen, H. &amp; Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 25(6), 1291&#x2013;1303 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR18\" id=\"ref-link-section-d142610135e535\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>, class imbalance<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Fonseca, E., Favory, X., Pons, J., Duchateau, J. &amp; Serra, X. FSD50K: An Open Dataset of Human-Labeled Sound Events. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 30, 829&#x2013;852 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR17\" id=\"ref-link-section-d142610135e539\" rel=\"nofollow noopener\" target=\"_blank\">17<\/a>, constrained clip durations<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Liang, J. et al. Mind the Domain Gap: A Systematic Analysis on Bioacoustic Sound Event Detection. Proc. Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR19\" id=\"ref-link-section-d142610135e543\" rel=\"nofollow noopener\" target=\"_blank\">19<\/a>, limited contextual diversity<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Fonseca, E., Favory, X., Pons, J., Duchateau, J. &amp; Serra, X. FSD50K: An Open Dataset of Human-Labeled Sound Events. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 30, 829&#x2013;852 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR17\" id=\"ref-link-section-d142610135e547\" rel=\"nofollow noopener\" target=\"_blank\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Cakir, E., Heittola, T., Huttunen, H. &amp; Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 25(6), 1291&#x2013;1303 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR18\" id=\"ref-link-section-d142610135e550\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>, and lack of access to original recordings<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Cakir, E., Heittola, T., Huttunen, H. &amp; Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 25(6), 1291&#x2013;1303 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR18\" id=\"ref-link-section-d142610135e555\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>. Such limitations are particularly relevant for outdoor environments, where overlapping sounds, fluctuating background noise, and high acoustic variability make sound detection and classification especially challenging<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Fonseca, E., Favory, X., Pons, J., Duchateau, J. &amp; Serra, X. FSD50K: An Open Dataset of Human-Labeled Sound Events. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 30, 829&#x2013;852 (2021).\" href=\"#ref-CR17\" id=\"ref-link-section-d142610135e559\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Cakir, E., Heittola, T., Huttunen, H. &amp; Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE\/ACM Transactions on Audio, Speech, and Language Processing. 25(6), 1291&#x2013;1303 (2017).\" href=\"#ref-CR18\" id=\"ref-link-section-d142610135e559_1\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Liang, J. et al. Mind the Domain Gap: A Systematic Analysis on Bioacoustic Sound Event Detection. Proc. Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop (2023).\" href=\"#ref-CR19\" id=\"ref-link-section-d142610135e559_2\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 20\" title=\"Mesaros, A., Heittola, T., Virtanen, T. &amp; Plumbley, M. D. Sound Event Detection: A Tutorial. IEEE Signal Processing Magazine, 38(5),\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR20\" id=\"ref-link-section-d142610135e562\" rel=\"nofollow noopener\" target=\"_blank\">20<\/a>. While synthetic datasets have been proposed to address some of these issues, they often fail to reflect the acoustic complexity and unpredictability of real-world soundscapes<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Ghosh, S. et al. Synthio: Augmenting small-scale audio classification datasets with synthetic data. Proceedings of the International Conference on Learning Representations (ICLR) (2025).\" href=\"#ref-CR24\" id=\"ref-link-section-d142610135e566\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Geiping, J. et al. How much data are augmentations worth? An investigation into scaling laws, invariance, and implicit regularization. Preprint at &#10;                  https:\/\/doi.org\/10.48550\/arXiv.2210.06441&#10;                  &#10;                 (2022).\" href=\"#ref-CR25\" id=\"ref-link-section-d142610135e566_1\">25<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Zeng, W. et al. Infusion: Preventing customized text-to-image diffusion from overfitting. ACM Multimedia (2024).\" href=\"#ref-CR26\" id=\"ref-link-section-d142610135e566_2\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Ghosh, S. et al. Compa: Addressing the gap in compositional reasoning in audio-language models. Preprint at &#10;                  https:\/\/doi.org\/10.48550\/arXiv.2310.08753&#10;                  &#10;                 (2023).\" href=\"#ref-CR27\" id=\"ref-link-section-d142610135e566_3\">27<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Ghosal, D., Majumder, N., Mehrish, A. &amp; Poria, S. Text-to-audio generation using instruction tuned LLM and latent diffusion model. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/arXiv.2304.1373&#010;                  &#010;                 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR28\" id=\"ref-link-section-d142610135e569\" rel=\"nofollow noopener\" target=\"_blank\">28<\/a>.<\/p>\n<p>The creation of an environmental sound database is a complex task that requires the collaboration of multidisciplinary expertise in different fields to make it manageable. There is a considerable amount of effort to create benchmark databases for use in environmental acoustics.<\/p>\n<p>To address the main issue raised, the present work presents DataSEC.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Fredianelli, L. et al. DataSEC - Dataset for Sound Event Classification of environmental noise. &#010;                  https:\/\/doi.org\/10.5281\/zenodo.17033970&#010;                  &#010;                 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR29\" id=\"ref-link-section-d142610135e580\" rel=\"nofollow noopener\" target=\"_blank\">29<\/a> and DataSED<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Fredianelli, L. et al. DataSED - Dataset for Sound Event Detection of environmental noise. &#010;                  https:\/\/doi.org\/10.5281\/zenodo.15346092&#010;                  &#010;                 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR30\" id=\"ref-link-section-d142610135e584\" rel=\"nofollow noopener\" target=\"_blank\">30<\/a>, two datasets in the form of .wav files, one specifically created for SEC and the other for SED of outdoor environmental noise. Both consist of real-world measurements covering a wide variety of sounds that can be heard outdoors in both urban and rural environments.<\/p>\n<p>The principal task in methodological development, specifically in the developing field of machine learning applied to environmental noise analysis, is the proper definition of sound event categories (classes), which is a salient distinction of these datasets from others. The dataset has been categorised into 22 macro classes of events, some of which are further organised into sub-categories that share characteristics or can be considered the same for the purposes of the work. The subdivision of the dataset into subfolders facilitates the adaptation of the dataset to suit specific objectives of other potential users.<\/p>\n<p>The main purpose of these datasets is to train, validate and test neural networks capable of recognising sound events. This can be used either to search for specific sounds in the environment, or to identify sounds to be removed from a larger track. Either purpose would avoid the need for manual identification, which is often costly and impractical. The main focus is on environmental acousticians who make long noise measurements and need to label specific sounds or remove unwanted sounds. At the same time, they may be of interest to, for example, ecologists studying bird\/animal populations, who will collect thousands of hours of field recordings, but whose measurements will certainly be affected by many unwanted sounds that need to be removed<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Stowell, D., Wood, M., Pamu&#x142;a, H., Stylianou, Y. &amp; Glotin, H. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods in Ecology and Evolution. 10(3), 368&#x2013;380 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR15\" id=\"ref-link-section-d142610135e594\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a>.<\/p>\n<p>The work\u2019s notable strengths lie in the rigorous methodology employed, with all samples drawn from both short and long measurements obtained from either online repositories or by the authors themselves using Class I sound level meters. Furthermore, the authors subjected all the processed audio files, amounting to around 35\u2009hours of audio samples (around 18 from DataSEC and 17 from DataSED), to meticulous listening, review and processing, resulting in a more descriptive and refined dataset.<\/p>\n<p>Each track in DataSEC may comprise a single event or multiple events of a single sound class. Conversely, DataSED consists of individual recordings or multiple sounds, either from the same class or from different classes, over a more protracted duration and accompanied by background noise. DataSED is uploaded in two versions. The first version does not contain overlapping events of different classes, such that each moment in time is assigned to a single class only. In contrast, the second version incorporates overlapping events from multiple classes, thereby providing a more realistic representation of real-world conditions. These two versions are offered to support the development and evaluation of two types of applications: monophonic sound detection and polyphonic sound detection<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Chan, T. K. &amp; Chin, C. S. A Comprehensive Review of Polyphonic Sound Event Detection. IEEE Access. 8, 103339&#x2013;103373 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05991-w#ref-CR31\" id=\"ref-link-section-d142610135e604\" rel=\"nofollow noopener\" target=\"_blank\">31<\/a>. To mitigate the subjectivity inherent in the labelling process, the authors have performed ground truth annotations according to the 22 class divisions on the SED dataset, ensuring consistent evaluation. While all audio files are in .wav format, the ground truth labelling is in .csv format. The SEC and SED datasets consist of 4292 and 712 .wav files, respectively. The SED dataset comprises 4034 grounds in its polyphonic version and 4309 labels in its monophonic version.<\/p>\n","protected":false},"excerpt":{"rendered":"The automation of acoustic analysis represents a rapidly evolving interdisciplinary domain of research, combining acoustics, signal processing, and&hellip;\n","protected":false},"author":2,"featured_media":108872,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24],"tags":[76752,1928,1929,111,139,69,393,147,78308],"class_list":{"0":"post-108871","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-physics","8":"tag-acoustics","9":"tag-humanities-and-social-sciences","10":"tag-multidisciplinary","11":"tag-new-zealand","12":"tag-newzealand","13":"tag-nz","14":"tag-physics","15":"tag-science","16":"tag-scientific-data"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/108871","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/comments?post=108871"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/posts\/108871\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media\/108872"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/media?parent=108871"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/categories?post=108871"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/nz\/wp-json\/wp\/v2\/tags?post=108871"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}