{"id":188396,"date":"2025-10-03T22:36:14","date_gmt":"2025-10-03T22:36:14","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/188396\/"},"modified":"2025-10-03T22:36:14","modified_gmt":"2025-10-03T22:36:14","slug":"liquid-ai-released-lfm2-audio-1-5b-an-end-to-end-audio-foundation-model-with-sub-100-ms-response-latency","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/188396\/","title":{"rendered":"Liquid AI Released LFM2-Audio-1.5B: An End-to-End Audio Foundation Model with Sub-100 ms Response Latency"},"content":{"rendered":"<p>Liquid AI has released LFM2-Audio-1.5B, a compact audio\u2013language foundation model that both understands and generates speech and text through a single end-to-end stack. It positions itself for low-latency, real-time assistants on resource-constrained devices, extending the LFM2 family into audio while retaining a small footprint. <\/p>\n<p><img decoding=\"async\" data-attachment-id=\"75010\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/01\/liquid-ai-released-lfm2-audio-1-5b-an-end-to-end-audio-foundation-model-with-sub-100-ms-response-latency\/screenshot-2025-10-01-at-9-51-55-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.51.55-AM-1.png\" data-orig-size=\"1586,982\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot 2025-10-01 at 9.51.55\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.51.55-AM-1-300x186.png\" data-large-file=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.51.55-AM-1-1024x634.png\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.51.55-AM-1-1024x634.png\" alt=\"\" class=\"wp-image-75010 lazyload\" style=\"width:744px;height:auto\"\/>https:\/\/www.liquid.ai\/blog\/lfm2-audio-an-end-to-end-audio-foundation-model<\/p>\n<p>But what\u2019s actually new? a unified backbone with disentangled audio I\/O<\/p>\n<p>LFM2-Audio extends the 1.2B-parameter LFM2 language backbone to treat audio and text as first-class sequence tokens. Crucially, the model disentangles audio representations: inputs are continuous embeddings projected directly from raw waveform chunks (~80 ms), while outputs are discrete audio codes. This avoids discretization artifacts on the input path while keeping training and generation autoregressive for both modalities on the output path.<\/p>\n<p>On the implementation side, the released checkpoint uses:<\/p>\n<p><a href=\"https:\/\/www.marktechpost.com\/2025\/09\/15\/nvidia-ai-open-sources-vipe-video-pose-engine-a-powerful-and-versatile-3d-video-annotation-tool-for-spatial-ai\/\" rel=\"nofollow noopener\" target=\"_blank\"> \ud83d\udea8 [Recommended Read] ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI <\/a><\/p>\n<p>Backbone: LFM2 (hybrid conv + attention), 1.2B params (LM only)<\/p>\n<p>Audio encoder: FastConformer (~115M, canary-180m-flash)<\/p>\n<p>Audio decoder: RQ-Transformer predicting discrete Mimi codec tokens (8 codebooks)<\/p>\n<p>Context: 32,768 tokens; vocab: 65,536 (text) \/ 2049\u00d78 (audio)<\/p>\n<p>Precision: bfloat16; license: LFM Open License v1.0; languages: English<\/p>\n<p>https:\/\/www.liquid.ai\/blog\/lfm2-audio-an-end-to-end-audio-foundation-model<\/p>\n<p>Two generation modes for real-time agents<\/p>\n<p>Interleaved generation for live, speech-to-speech chat where the model alternates text and audio tokens to minimize perceived latency.<\/p>\n<p>Sequential generation for ASR\/TTS (switching modalities turn-by-turn).<\/p>\n<p>Liquid AI provides a Python package (liquid-audio) and a Gradio demo to reproduce these behaviors.<\/p>\n<p>Latency: &lt;100 ms to first audio<\/p>\n<p>Liquid AI team reports end-to-end latency below 100 ms from a 4-second audio query to the first audible response\u2014a proxy for perceived responsiveness in interactive use\u2014stating it is faster than models smaller than 1.5B parameters under their setup.<\/p>\n<p>Benchmarks: VoiceBench and ASR results<\/p>\n<p>On VoiceBench\u2014a suite of nine audio-assistant evaluations\u2014Liquid reports an overall score of 56.78 for LFM2-Audio-1.5B, with per-task numbers disclosed in the blog\u2019s chart (e.g., AlpacaEval 3.71, CommonEval 3.49, WildVoice 3.17). The Liquid AI team contrasts this result with larger models like Qwen2.5-Omni-3B and Moshi-7B in the same table. (VoiceBench is an external benchmark introduced in late 2024 for LLM-based voice assistants)<\/p>\n<p>The model card on Hugging Face provides an additional VoiceBench table (with closely related\u2014but not identical\u2014per-task values) and includes classic ASR WERs where LFM2-Audio matches or improves on Whisper-large-v3-turbo for some datasets despite being a generalist speech\u2013text model. For example (lower is better): AMI 15.36 vs. 16.13 (Whisper-large-v3-turbo), LibriSpeech-clean 2.03 vs. 2.10. <\/p>\n<p><img decoding=\"async\" data-attachment-id=\"75008\" data-permalink=\"https:\/\/www.marktechpost.com\/2025\/10\/01\/liquid-ai-released-lfm2-audio-1-5b-an-end-to-end-audio-foundation-model-with-sub-100-ms-response-latency\/screenshot-2025-10-01-at-9-45-44-am-2\/\" data-orig-file=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.45.44-AM-1.png\" data-orig-size=\"1312,828\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screenshot 2025-10-01 at 9.45.44\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.45.44-AM-1-300x189.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.45.44-AM-1-1024x646.png\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/Screenshot-2025-10-01-at-9.45.44-AM-1.png\" alt=\"\" class=\"wp-image-75008 lazyload\" style=\"width:658px;height:auto\"\/>https:\/\/huggingface.co\/LiquidAI\/LFM2-Audio-1.5B<\/p>\n<p>Alright, but why does it really matter in voice AI trends?<\/p>\n<p>Most \u201comni\u201d stacks couple ASR \u2192 LLM \u2192 TTS, which adds latency and brittle interfaces. LFM2-Audio\u2019s single-backbone design with continuous input embeddings and discrete output codes reduces glue logic and allows interleaved decoding for early audio emission. For developers, this translates to simpler pipelines and faster perceived response times, while still supporting ASR, TTS, classification, and conversational agents from one model. Liquid AI provides code, demo entry points, and distribution via Hugging Face.<\/p>\n<p>Check out the\u00a0<a href=\"https:\/\/github.com\/Liquid4All\/liquid-audio\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub Page<\/a>,\u00a0<a href=\"https:\/\/huggingface.co\/LiquidAI\/LFM2-Audio-1.5B\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Hugging Face Model Card<\/a> and <a href=\"https:\/\/www.liquid.ai\/blog\/lfm2-audio-an-end-to-end-audio-foundation-model\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Technical details<\/a>.\u00a0Feel free to check out our\u00a0<a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">GitHub Page for Tutorials, Codes and Notebooks<\/a>.\u00a0Also,\u00a0feel free to follow us on\u00a0<a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Twitter<\/a>\u00a0and don\u2019t forget to join our\u00a0<a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">100k+ ML SubReddit<\/a>\u00a0and Subscribe to\u00a0<a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">our Newsletter<\/a>. Wait! are you on telegram?\u00a0<a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">now you can join us on telegram as well.<\/a><\/p>\n<p><a class=\"m-a-box-avatar-url\" href=\"https:\/\/www.marktechpost.com\/author\/6flvq\/\" rel=\"nofollow noopener\" target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/10\/Screen-Shot-2021-09-14-at-9.02.24-AM-150x150.png\" class=\"avatar avatar-150 photo\" alt=\"\"   data-attachment-id=\"17663\" data-permalink=\"https:\/\/www.marktechpost.com\/?attachment_id=17663\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2019\/06\/Screen-Shot-2021-09-14-at-9.02.24-AM.png\" data-orig-size=\"832,778\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screen Shot 2021-09-14 at 9.02.24 AM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2019\/06\/Screen-Shot-2021-09-14-at-9.02.24-AM-300x281.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2019\/06\/Screen-Shot-2021-09-14-at-9.02.24-AM.png\"\/><\/a><\/p>\n<p>Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.<\/p>\n<p><a href=\"https:\/\/www.google.com\/preferences\/source?q=MARKTECHPOST.com\" rel=\"nofollow noopener\" target=\"_blank\"> \ud83d\ude4c Follow MARKTECHPOST: Add us as a preferred source on Google.<\/a>        <\/p>\n","protected":false},"excerpt":{"rendered":"Liquid AI has released LFM2-Audio-1.5B, a compact audio\u2013language foundation model that both understands and generates speech and text&hellip;\n","protected":false},"author":2,"featured_media":188397,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-188396","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/188396","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=188396"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/188396\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/188397"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=188396"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=188396"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=188396"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}