How to tame LLMs with quantum-inspired compression

Compression can achieve drastic reductions in energy consumption with minimal impact on accuracy, says Román Orus of Multiverse Computing

Image:

How to tame LLMs with quantum-inspired compression

A new generation of quantum-inspired compression techniques helps reduce the model size and energy requirements of LLMs while also addressing data privacy.

Given the increasing energy consumption of AI, innovative approaches are needed. One of these approaches is quantum-inspired compression technology. This technology revolutionises the use of large language models (LLMs) by drastically reducing model size and energy consumption while maintaining performance.

This development enables the efficient and secure use of AI, particularly in energy- and data-sensitive areas such as manufacturing, healthcare, and edge computing. This, in turn, is beneficial for the protection of sensitive data. Román Orus, co-founder and chief scientific officer at the quantum AI company Multiverse Computing, answers four questions about this technology.

In Europe, AI has become a key strategic pillar across industries – from the automotive and defence industries to healthcare and advanced manufacturing. However, the rapid expansion of AI, particularly through LLMs, raises concerns about energy consumption and dependence on infrastructure, as well as control over sensitive data. The International Energy Agency (IEA) estimates that by 2030, data processing for AI will consume more electricity in the US alone than the production of steel, cement, chemicals, and all other energy-intensive goods combined.

In Germany, the Energy Efficiency Act sets targets for reducing energy consumption by 2030. Datacentres must recover or reuse at least 10% of their energy starting in 2026, and this share is to rise to 20% by 2028. These savings are only possible with new technological approaches, such as a new generation of quantum-inspired compression techniques.

These methods significantly reduce model size and energy requirements while maintaining performance, enabling AI to be deployed locally, securely, and efficiently.

Román Orus, co-founder and chief scientific officer at the quantum AI company Multiverse Computing and developer of CompactifAI, an AI model compressor, answers four questions about the scientific principles behind compressed AI:

The next generation of compression techniques for LLMs achieves drastic size reductions with minimal impact on accuracy. What are the scientific principles that enable this breakthrough?

The breakthrough lies in quantum-inspired tensor networks, which restructure neural networks at their core by breaking large matrices into smaller, interconnected components. Combined with quantisation techniques, this approach reduces the number of parameters while preserving essential correlations within the data.

Compared to traditional pruning or quantisation (reducing the accuracy of numerical values), which can affect accuracy in sensitive applications, compression using tensor networks preserves the model’s full operational capacity. In practice, this results in a reduction in model size of up to 95%, improved inference speed, and compatibility with a wide range of high-performance computers, including edge devices.

Image

Description

Román Orus, Multiverse Computing

What deployment scenarios are already possible with the new LLM architectures, for example, on edge AI devices or in constrained environments?

Compression enables a shift from cloud-based AI to edge computing and opens up a wide range of applications. Compressed AI models offer a key advantage: They can be deployed locally on hardware with limited computing capacity, such as drones or embedded systems. By reducing model size and hardware requirements, AI can run entirely at the network edge, delivering instant, real-time intelligence without relying on external infrastructure and without excessive energy consumption.

A reconnaissance drone patrolling a border area, for example, with an embedded AI system trained to recognise terrain features or foreign equipment can carry out its mission autonomously even in the absence of radio communication. In cybersecurity or electronic warfare, local AI deployment also ensures that sensitive data remains confined to the operational area, increasing both security and tactical reliability.

In production, compressed models enable real-time monitoring and predictive maintenance on-site without sending sensitive data externally. A successful project at a European manufacturing facility demonstrated the benefits of quantum-inspired compression of an AI model for automotive component manufacturing. The application of advanced tensor network methods resulted in a significant reduction in size, doubling the model’s response time and reducing its energy consumption by approximately 50%.

In healthcare, hospitals can perform diagnostics locally – on tablets, workstations, or in private datacentres – protecting patient privacy and meeting regulatory requirements. This method also enables smaller institutions with limited infrastructure to benefit from AI capabilities. The common denominator is compact, efficient and GPU-independent AI that can operate reliably in diverse environments.

As AI models continue to grow in size, the need for efficiency is greater than ever. Can these advanced compression techniques provide a scalable response to the environmental and energy challenges associated with generative AI?

Yes. As we’ve seen in manufacturing, compressed models can reduce energy consumption by up to 50% while delivering twice the response speed. This leads to significant cost savings and a more sustainable AI ecosystem. By reducing the number of operations per inference, these techniques reduce both the carbon footprint and the total cost of ownership without compromising performance.

Europe has strong academic roots in quantum and AI research, yet funding gaps persist. How can deeptech startups in these fields translate scientific leadership into industrial leadership? What role should public institutions play?

Scientific excellence alone is not sufficient to ensure industrial competitiveness. One of the biggest challenges for deeptech startups is securing stable revenue streams that demonstrate market viability. One of the biggest barriers is the lack of annual recurring revenue. Public programmes often support early-stage research and development, but overlook the necessary commercialisation.

This article was first published in German on Computing’s sister site Computing Deutschland

How to tame LLMs with quantum-inspired compression

Tags: