However, nearly one-third of respondents (27%) plan to stay on the cloud, even if it costs more. Some organizations may be skeptical that AI workloads will scale enough in the long term to justify a move away from the cloud, but they may be missing the big picture on cost overall. Beyond data storage and computing cost (typically measured in GPUs and central processing units [CPUs]), organizations should also account for the cost of model usage and the increased cost that comes from inferencing,4 typically measured using AI tokens.
As organizations are learning more and models are evolving, some leaders may have seen that understanding cost dynamics could benefit from a thoughtful approach to financial operations across the entire data and model life cycle. This could go beyond cloud cost, including the cost of data hosting, networking, inferencing, and latency for HPC across hybrid infrastructures. As solutions become more advanced, leaders should consider being nimble in making decisions that give them flexibility in a fast-moving environment where new options are emerging for data hosting, computing, networking, and inferencing.
Across the life cycle of building, operating, and running AI infrastructure, technology resiliency can involve:
Modular design: Composable components that isolate failures and enable rapid recovery or replacementRobust versioning: Strategies to prevent data loss during development and trainingContinuous monitoring: Ongoing health checks and anomaly detection across hardware, software, and data pipelinesProactive security: Real-time threat detection, incident response, and regular patching to address evolving cyber risksDynamic scalability: Ability to scale resources seamlessly to meet unexpected spikes in demand or workload changesLatency and performance monitoring: Solutions that track the speed and throughput of compute to ensure that latency requirements are met relative to the speed of compute requiredFinancial operations transformation: A rework of cloud financial operations to incorporate increases in data calls and model usage specific to cloud machine learning
Technology resilience may depend on systems being able to operate amid both expected shifts and sudden disruptions. It should be about more than business continuity, also involving proactive planning for infrastructure changes.
AI magnifies that challenge for some. These systems don’t just run on tech—they’re embedded in complex digital ecosystems. As their role in the organization grows, so does the need for infrastructure that can keep pace.
Leaders can potentially strengthen hybrid cloud machine learning agility by taking some deliberate steps to prepare. Consider the following.
Decide where the organization wants its data gravity to be. As the organization scales AI use, deciding where data “lives”—securely, accessibly, and with high performance—becomes an important decision. The location of your most queried data effectively becomes the center of gravity for AI applications and agents, regardless of where inferencing happens. While an AI application may do a good job pulling data from multiple internal and external sources, data sprawl can add to complexity and cost. Be intentional about what stays on public and private clouds, what moves onto it, and what may need to be moved off of it to recalibrate your data center of gravity. Factor in privacy, latency, workflow, and sovereignty requirements to determine if data should be centralized in a private cloud, traditional on-prem, or private AI infrastructure options. Factor also in connectivity, interoperability, and latency needs that may require more open solutions, such as a public cloud.Design for interoperable networks that build on and off ramps for models to be hosted where they are now and where they may be needed in the future. Organizations working to build resilient hybrid infrastructure models that adapt to future needs might be better served by designing for interoperability from the start. While it was initially believed that training foundation models and running production AI models within applications that conduct inferencing would not require the same level of data input, networking, and processing power, scaled AI applications have shown they too require a thoughtful approach to networking so that data storage is not a drain on the system and high performance can be maintained across many users.5
Some companies are building on-prem GPU farms for both training and inference needs. Others are using models that are application programming interface–enabled and require no on-prem infrastructure. The answer could be to use a combination—for instance, an open-source or open-weight large language model on-prem that could be fine-tuned on a private cluster.
Rather than locking into one setup, infrastructure should allow models to be hosted and workloads to move based on context—across systems, users, and even multi-agent operations—while maintaining the right guardrails and access controls. Consider investing in dedicated hardware, such as large clusters, as needs are understood. A pragmatic, incremental approach, leveraging both cloud and alternatives, could better deliver the agility, cost-efficiency, and control grounded in real-world demands required for high-performance computing that AI workloads typically require.