{"id":115110,"date":"2025-09-04T23:27:22","date_gmt":"2025-09-04T23:27:22","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/115110\/"},"modified":"2025-09-04T23:27:22","modified_gmt":"2025-09-04T23:27:22","slug":"edge-ai-powers-tiny-models-for-smart-devices","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/115110\/","title":{"rendered":"Edge AI Powers Tiny Models for Smart Devices"},"content":{"rendered":"<p><a href=\"https:\/\/spectrum.ieee.org\/tag\/large-language-models\" rel=\"nofollow noopener\" target=\"_blank\">Large language models<\/a> are powerful, but generally they require vast computing resources, which means they typically have to run on stacks of high-end <a href=\"https:\/\/spectrum.ieee.org\/tag\/gpus\" rel=\"nofollow noopener\" target=\"_blank\">GPUs<\/a> in <a href=\"https:\/\/spectrum.ieee.org\/tag\/data-centers\" rel=\"nofollow noopener\" target=\"_blank\">data centers<\/a>. Now, startup <a href=\"https:\/\/multiversecomputing.com\/\" rel=\"noopener noreferrer nofollow\" target=\"_blank\">Multiverse Computing<\/a> has created models it says are comparable in size to the brains of chickens and flies\u2014allowing the company to shrink powerful LLMs so that they can run on home <a href=\"https:\/\/spectrum.ieee.org\/tag\/appliances\" rel=\"nofollow noopener\" target=\"_blank\">appliances<\/a>, <a href=\"https:\/\/spectrum.ieee.org\/tag\/smartphones\" rel=\"nofollow noopener\" target=\"_blank\">smartphones<\/a>, or cars.<\/p>\n<p>Multiverse, based in Donostia, <a href=\"https:\/\/spectrum.ieee.org\/tag\/spain\" rel=\"nofollow noopener\" target=\"_blank\">Spain<\/a>,  is working at the intersection of two of technology\u2019s most in-vogue fields\u2014AI and <a data-linked-post=\"2657574494\" href=\"https:\/\/spectrum.ieee.org\/quantum-computing-for-dummies\" target=\"_blank\" rel=\"nofollow noopener\">quantum computing<\/a>. The company\u2019s flagship product is a software platform called <a href=\"https:\/\/spectrum.ieee.org\/tag\/singularity\" rel=\"nofollow noopener\" target=\"_blank\">Singularity<\/a>, designed to allow nonexperts to work with quantum <a href=\"https:\/\/spectrum.ieee.org\/tag\/algorithms\" rel=\"nofollow noopener\" target=\"_blank\">algorithms<\/a>, but it has also developed <a data-linked-post=\"2659761734\" href=\"https:\/\/spectrum.ieee.org\/remembering-jacob-ziv\" target=\"_blank\" rel=\"nofollow noopener\">compression<\/a> technology called CompactifAI for shrinking <a href=\"https:\/\/spectrum.ieee.org\/tag\/neural-networks\" rel=\"nofollow noopener\" target=\"_blank\">neural networks<\/a>.<\/p>\n<p> The software relies on tensor networks\u2014mathematical tools originally developed to simulate quantum systems on classical hardware. But their ability to distill complex multidimensional systems into something more compact and easier to work with also makes them a promising avenue for compressing large <a data-linked-post=\"2668550118\" href=\"https:\/\/spectrum.ieee.org\/small-language-models-apple-microsoft\" target=\"_blank\" rel=\"nofollow noopener\">AI models<\/a>.<\/p>\n<p>Multiverse\u2019s Nano Models Shrink AI<\/p>\n<p> Multiverse has now used CompactifAI to create a new family of \u201cnano models\u201d that it calls Model Zoo, with each one named after the animal whose brain (theoretically) has a comparable amount of processing power. The first two releases are a compressed version of Meta\u2019s <a href=\"https:\/\/spectrum.ieee.org\/tag\/llama\" rel=\"nofollow noopener\" target=\"_blank\">Llama<\/a> 3.1 model dubbed ChickenBrain, which can bring reasoning capabilities to a <a href=\"https:\/\/spectrum.ieee.org\/tag\/raspberry-pi\" rel=\"nofollow noopener\" target=\"_blank\">Raspberry Pi<\/a>, and a version of the open-source model <a href=\"https:\/\/huggingface.co\/HuggingFaceTB\/SmolLM2-135M\" target=\"_blank\" rel=\"nofollow noopener\">SmolLM2 135M<\/a> small enough to run on a smartphone, dubbed SuperFly.<\/p>\n<p> \u201cSuperFly is a 94-million-parameter model, which is tiny. It\u2019s definitely one of the smallest LLMs out there,\u201d says <a href=\"https:\/\/www.linkedin.com\/in\/sam-mugel\/?originalSubdomain=ca\" target=\"_blank\" rel=\"nofollow noopener\">Sam Mugel<\/a>, Multiverse\u2019s chief technology officer. \u201cAny device that\u2019s expensive enough that you could justify putting a Raspberry Pi in would be able to host an LLM like SuperFly.\u201d That means expensive electronics like a washing machine or a fridge could now have AI capabilities they would otherwise not be able to incorporate.<\/p>\n<p> The company says this could bring AI capabilities to a wide range of appliances, and in particular the ability to control devices using natural language. Being able to run LLMs locally rather than via the cloud has a host of benefits, says Mugel, including significantly reduced latency and fewer security and privacy risks due to data being processed on-device.<\/p>\n<p> They could be particularly useful for applications where <a href=\"https:\/\/spectrum.ieee.org\/tag\/internet\" rel=\"nofollow noopener\" target=\"_blank\">Internet<\/a> connections may be unreliable, Mugel says. SuperFly is small enough to be directly embedded in a car\u2019s dashboard, which could allow uninterrupted natural-language control even while driving through tunnels or in areas with poor network coverage.<\/p>\n<p> Compressing models is standard practice these days, thanks to growing concerns around the <a data-linked-post=\"2650279667\" href=\"https:\/\/spectrum.ieee.org\/energy-efficient-green-ai-strategies\" target=\"_blank\" rel=\"nofollow noopener\">energy and hardware footprints of the largest models<\/a>. Neural networks are surprisingly inefficient learners and contain a lot of redundant information, says Mugel, which leaves a lot of room for optimization.  <\/p>\n<p> This is typically done using techniques like quantization, which involves using fewer bits to represent a model\u2019s weights, or pruning out connections in the neural network that aren\u2019t contributing much to performance. But Mugel says Multiverse\u2019s quantum-inspired tensor networks approach can go further than either of these more conventional approaches, and can also be combined with quantization to push compression even further.<\/p>\n<p> The first step in the process involves scanning a model to see which layers are most suitable for compression. These layers are then reorganized into tensor networks, which retain the most important patterns in the layer\u2019s weights while discarding redundant information that isn\u2019t contributing much to overall performance. Finally, the compressed model goes through a \u201chealing\u201d step where it is briefly retrained on the task at hand.<\/p>\n<p>\u201cWe\u2019ve reorganized the neural network a little bit, and we\u2019ve done a procedure that might take it out of the optimal points of the training,\u201d says Mugel. \u201cThe healing is analogous to how people, after a really bad accident, may need a little bit of <a href=\"https:\/\/spectrum.ieee.org\/tag\/rehabilitation\" rel=\"nofollow noopener\" target=\"_blank\">rehabilitation<\/a>. That doesn\u2019t mean relearning a task from scratch, it just means getting familiar with it again.\u201d<\/p>\n<p>Efficient AI for Smartphones<\/p>\n<p> The company used this process to create its SuperFly model, which is roughly 30 percent smaller than the model it was compressed from. At just 94 million parameters, it is comparable in size to two fruit-fly brains, says Mugel, which have roughly 50 million neural connections. When the company\u2019s researchers installed it on an <a href=\"https:\/\/spectrum.ieee.org\/tag\/iphone\" rel=\"nofollow noopener\" target=\"_blank\">iPhone<\/a> 14 Pro it took up only 191 megabytes of disk space and could process a respectable 115 tokens per second.<\/p>\n<p> ChickenBrain is considerably larger, at 3.2 billion parameters, which Mugel admits is similar in size to other smaller language models. But this represents a 60 percent reduction from the 8-billion-parameter Llama model it was compressed from. And the team was also able to add reasoning skills to the model despite the significantly reduced footprint, though Multiverse declined to explain how these new capabilities were achieved.<\/p>\n<p> This means that ChickenBrain actually outperforms the model it was compressed from on a range of benchmarks when running on similar hardware, including the language-focused <a href=\"https:\/\/arxiv.org\/abs\/2406.01574\" target=\"_blank\" rel=\"nofollow noopener\">MMLU-Pro<\/a>, math-focused <a href=\"https:\/\/www.kaggle.com\/benchmarks\/open-benchmarks\/math-500\" target=\"_blank\" rel=\"nofollow noopener\">Math-500<\/a> and <a href=\"https:\/\/huggingface.co\/datasets\/openai\/gsm8k\" target=\"_blank\" rel=\"nofollow noopener\">GSM8K<\/a>, and general knowledge-focused <a href=\"https:\/\/huggingface.co\/datasets\/fingertap\/GPQA-Diamond\" target=\"_blank\" rel=\"nofollow noopener\">GPQA-Diamond<\/a>.<\/p>\n<p>\u201cWhat we\u2019re demonstrating is that we can modify Llama 3.1 8B to make it more powerful with a fraction of the size,\u201d says Mugel. \u201cIt\u2019s an important step for making AI leaner and more efficient, as well as opening up new domains for <a href=\"https:\/\/spectrum.ieee.org\/tag\/ai-models\" rel=\"nofollow noopener\" target=\"_blank\">AI models<\/a> at the edge.\u201d<\/p>\n<p><a href=\"https:\/\/faculty.fudan.edu.cn\/xuzenglin\/en\/jsxx\/1155986\/jsxx\/jsxx.htm\" target=\"_blank\" rel=\"nofollow noopener\">Zenglin Xu<\/a>, a professor at the <a href=\"https:\/\/spectrum.ieee.org\/topic\/artificial-intelligence\/\" rel=\"nofollow noopener\" target=\"_blank\">Artificial Intelligence<\/a> Innovation and Incubation Institute at Fudan University, in <a href=\"https:\/\/spectrum.ieee.org\/tag\/shanghai\" rel=\"nofollow noopener\" target=\"_blank\">Shanghai<\/a>, says that tensor networks are a promising tool for compression and often provide better results than similar techniques that attempt to simplify layers of a neural network. However, it remains unclear how well models compressed in this way can deal with more-complicated reasoning tasks. \u201cEspecially for problems with longer inference chains, the performance could be suboptimal compared with other techniques,\u201d adds Xu.<\/p>\n<p>And despite the compression achieved so far, Mugel admits that there\u2019s still a long way to go before today\u2019s frontier models can be squeezed onto edge devices. But he says there\u2019s plenty of scope to improve Multiverse\u2019s compression techniques, and at the same time more-efficient architectures are bringing cutting-edge capabilities to ever smaller models.<\/p>\n<p> \u201cHow much more can we squeeze out of 3 billion parameters?\u201d he says. \u201cThat\u2019s really hard to say, but I do believe the we\u2019re going to see way better performance in the very near future.\u201d<\/p>\n<p>From Your Site Articles<\/p>\n<p>Related Articles Around the Web<\/p>\n","protected":false},"excerpt":{"rendered":"Large language models are powerful, but generally they require vast computing resources, which means they typically have to&hellip;\n","protected":false},"author":2,"featured_media":115111,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[733,4323,4567,4568,5289,55365,86,56,54,55],"class_list":{"0":"post-115110","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-artificial-intelligence","9":"tag-computing","10":"tag-edge-ai","11":"tag-edge-computing","12":"tag-iot","13":"tag-iot-devices","14":"tag-technology","15":"tag-uk","16":"tag-united-kingdom","17":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/115110","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=115110"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/115110\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/115111"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=115110"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=115110"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=115110"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}