{"id":381815,"date":"2026-04-04T15:21:07","date_gmt":"2026-04-04T15:21:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/381815\/"},"modified":"2026-04-04T15:21:07","modified_gmt":"2026-04-04T15:21:07","slug":"prismml-debuts-1-bit-llm-in-bid-to-free-ai-from-the-cloud-the-register","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/381815\/","title":{"rendered":"PrismML debuts 1-bit LLM in bid to free AI from the cloud \u2022 The Register"},"content":{"rendered":"<p>PrismML, an AI venture out of Caltech, has released a 1-bit large language model that outperforms weightier models, with the expectation that it will improve AI efficiency and viability on mobile devices, among other applications.<\/p>\n<p>The model, dubbed <a target=\"_blank\" href=\"https:\/\/prismml.com\/news\/bonsai-8b\" rel=\"nofollow noopener\">Bonsai 8B<\/a>, manages to be small and fast, with modest power demands and benchmark performance characteristics that rival much larger models.<\/p>\n<p>&#8220;Our first proof point is 1-bit Bonsai 8B, a 1-bit model that fits into 1.15 GB of memory and delivers over 10x the intelligence density of its full-precision counterparts,&#8221; the company said in a social media <a target=\"_blank\" href=\"https:\/\/www.linkedin.com\/posts\/prismml_today-prismml-is-emerging-from-stealth-activity-7444810910734647296-BhXD\" rel=\"nofollow noopener\">post<\/a>. &#8220;It is 14x smaller, 8x faster, and 5x more energy efficient on edge hardware while remaining competitive with other models in its parameter-class.&#8221;<\/p>\n<p>AI models based on the Transformer architecture involve neural networks with millions or billions of <a target=\"_blank\" href=\"https:\/\/ml-cheatsheet.readthedocs.io\/en\/latest\/nn_concepts.html\" rel=\"nofollow noopener\">weights<\/a>, which control the strength of connections between neurons and influence how the model performs tasks. They&#8217;re set during the training process and they take up memory space based on the precision used to represent them.<\/p>\n<p>A model quantized at GGUF FP16 (16 bits) will take up much more space than one quantized at GGUF Q8_0 (8 bits) or GGUF Q4_0 (4 bits) or GGUF Q2_K (2 bits). That&#8217;s excluding metadata and overhead that might increase actual storage space required. But\u00a0given the same basic architecture, 16-bit models generally perform better than models quantized at lower levels.<\/p>\n<p>PrismML&#8217;s Bonsai model family is based on an architecture where &#8220;each weight is represented only by its sign, {\u22121, +1}, while a shared scale factor is stored for each group of weights,&#8221; as explained in the company&#8217;s <a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/github.com\/PrismML-Eng\/Bonsai-demo\/blob\/main\/1-bit-bonsai-8b-whitepaper.pdf\">white paper<\/a> [PDF], instead of a 16-bit or 32-bit floating point number. Researchers have been working on improved approaches to quantization for many years, described in papers like &#8220;<a target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/1708.04788\" rel=\"nofollow noopener\">BitNet: Bit-Regularized Deep Neural Networks<\/a>&#8221; (2017) and &#8220;<a target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2402.17764\" rel=\"nofollow noopener\">The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits<\/a>&#8221; (2024).<\/p>\n<p>PrismML&#8217;s approach is based on work done by Caltech electrical engineering professor Babak Hassibi and colleagues. The company claims that its 1-bit architecture avoids the tradeoffs that historically have accompanied low-bit quantization, specifically poor instruction following, errant multi-step reasoning, and unreliable tool use.<\/p>\n<p>&#8220;We spent years developing the mathematical theory required to compress a neural network without losing its reasoning capabilities,&#8221; said Babak Hassibi, CEO and founder of PrismML, in a statement. &#8220;We see 1-bit not as an endpoint, but as a starting point.&#8221;<\/p>\n<p>Hassibi argues that the company&#8217;s 1-bit architecture establishes a new paradigm for AI that&#8217;s focused on intelligence per unit of compute and energy.<\/p>\n<p>To encourage others to think along these lines \u2013 remember when performance-per-watt became a thing? \u2013 PrismML proposes the measurement of intelligence density, a metric that shows its models in a good light.<\/p>\n<p>&#8220;We define intelligence density as the negative of the log of the model&#8217;s average error rate (across the same benchmark suite) divided by the model size,&#8221; the company explains.<\/p>\n<p>Assessed for intelligence density, Qwen3 8B, which comes out a bit ahead of Bonsai 8B in various benchmarks (MMLU Redux, MuSR, GSM8K, etc), scores just 0.10\/GB for intelligence density, far short of Bonsai 8B at 1.06\/GB.<\/p>\n<p>Metrics may matter for marketing, but the more meaningful yardstick for PrismML&#8217;s models is their potential to move AI out of cloud datacenters. The company foresees its models powering on-device agents, real-time robotics, secure enterprise systems, and other projects where memory bandwidth, power, or compliance constraints can hinder deployment.<\/p>\n<p>&#8220;1-bit Bonsai 8B runs natively on Apple devices (Mac, iPhone, iPad) via MLX, on Nvidia GPUs via llama.cpp CUDA,&#8221; the company says. &#8220;Model weights are <a target=\"_blank\" href=\"https:\/\/github.com\/PrismML-Eng\/Bonsai-demo\/\" rel=\"nofollow noopener\">available<\/a> <a target=\"_blank\" href=\"https:\/\/huggingface.co\/collections\/prism-ml\/bonsai\" rel=\"nofollow noopener\">today<\/a> under the Apache 2.0 License.&#8221;<\/p>\n<p>Two smaller models are also available: 1-bit Bonsai 4B and 1-bit Bonsai 1.7B. \u00ae<\/p>\n","protected":false},"excerpt":{"rendered":"PrismML, an AI venture out of Caltech, has released a 1-bit large language model that outperforms weightier models,&hellip;\n","protected":false},"author":2,"featured_media":381816,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[220,218,219,61,60,80],"class_list":{"0":"post-381815","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ie","12":"tag-ireland","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/381815","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=381815"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/381815\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/381816"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=381815"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=381815"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=381815"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}