SYSTEM OPTIMAL
LATENCY: 12msALERT: HBM3E SUPPLY CONSTRAINED
04:08:44 UTC
linux-foundation·Published Apr 27, 2026

KubeCon + CloudNativeCon NA 2025

Cloud-native is in the spotlight during theCUBE’s coverage of KubeCon + CloudNativeCon NA 2025. Discover how Kubernetes drives AI/ML at scale and how platform engineering simplifies complexity. Hear C

3 pillars · 19 citations· 19/19 verified (100%) against source transcripts·Source event on theCUBE ↗
QA PASSEditorial QA Gate · rubric
  • Citation verification rate:100.0% (≥ 95%)
  • Fabricated quote count:0 (= 0)
  • Verified citation density:19 (≥ 8)
  • Named operators cited:12 (≥ 4)
  • Tracked-ticker linkage:2 (≥ 2)
  • All three pillars present:developer + deepTech + cSuite (developer + deepTech + cSuite)

Developer

6 citations

For practitioners shipping against this infrastructure

Developer Infrastructure Evolution at Scale

Kubernetes infrastructure is hitting unprecedented scale milestones that fundamentally change how developers architect AI workloads. Gari Singh from Google revealed that GKE now supports 130,000 nodes in a single cluster, doubling from 65,000 just 12 months ago. "You take 130,000 and you multiply that times 1k, 3K, whatever, you've now got this massive string of records that you have to store," Singh explained, noting they've replaced etcd with Spanner internally to handle the verbosity of YAML/JSON at this scale.

The shift toward inference-first AI architecture is reshaping Kubernetes primitives. Eddie Villalba from Google detailed their inference gateway approach: "We looked at the customer journey. What is that customer journey for this new kind of serving workload." The team launched inference API primitives in Kubernetes open source, working backwards from customer needs. Akshay Ram highlighted the core challenge: "With LLMs, you can have a request saying just like FAQ, how's the weather kind of request, which is pretty predictable. Or you can say, 'Summarize this document with a ginormous document.' So the load on the backend varies a lot."

Zero-touch operations are becoming the new standard for production Kubernetes deployments. Brian Cook from DTCC described their approach: "No changes can be directly made to any environment that's not a lab... We write everything as code, right? So everything is declarative." Their platform automatically reverts manual changes within 15 minutes using GitOps drift detection. Cook's team has implemented what he calls "permanent chaos" - continuous randomized failure injection across performance clusters to validate recovery capabilities.

Infrastructure-as-code tooling is consolidating around GitOps patterns for both traditional and AI workloads. Dimitri Vlachos from Spacelift noted the evolution: "ClickOps is great for the basic beginnings, but if you're really going to build a program, that's where IaC and GitOps come together." The shift from manual console operations to declarative infrastructure management is accelerating as organizations scale their AI inference deployments.

Agent runtime infrastructure is emerging as a distinct category. Idit Levine from Solo.io announced their K-Agent project: "We announced K-Agent, which is basically taking this Kubernetes, and make it a runtime that you can run actually those agent, no matter which framework you're bringing." This represents a fundamental shift from treating agents as applications to treating them as infrastructure primitives that require specialized orchestration capabilities within Kubernetes clusters.

Deep Tech

8 citations

For analysts, investors, and infrastructure architects

Deep Tech: The Infrastructure Reality Behind AI's Production Bottleneck

The MIT statistic haunting every AI conversation—that only 5% of AI projects reach production—isn't a failure of ambition but a brutal collision with infrastructure physics. After three days of operator interviews at KubeCon, the pattern is clear: enterprises are drowning in the complexity gap between AI experimentation and production-grade deployment at scale.

The Kubernetes-AI Infrastructure Stack Is Fragmenting Under Load

The most revealing conversation came from Brian Monroe at NVIDIA, whose team runs Kubernetes infrastructure for chip design and R&D: "We're definitely a container-first environment. So in every aspect of our R&D group, we're trying to get out of fixed infrastructure models." But Monroe's reality check cuts deeper—NVIDIA, the company literally powering the AI revolution, is struggling with the same infrastructure complexity as everyone else. When asked about their Portworx deployment, Monroe was candid: "Well, we're a Portworx customer" for persistent storage across their massive Kubernetes clusters supporting chip design workflows.

This isn't just about storage. The infrastructure requirements for AI workloads are fundamentally different from traditional web services. As Akshay Ram from Google explained: "With LLMs, you can have a request saying just like FAQ, how's the weather kind of request, which is pretty predictable. Or you can say, 'Summarize this document with a ginormous document.' So the load on the backend varies a lot." The unpredictable resource consumption patterns are breaking traditional Kubernetes scheduling and resource management assumptions.

Scale Numbers That Should Terrify Infrastructure Teams

Google's Gari Singh dropped the most sobering infrastructure metric of the conference: GKE now supports 130,000 nodes in a single cluster, doubling from 65,000 just 12 months ago. "It's usually in the massive training jobs, massive AI jobs that need a lot of compute," Singh explained. "There's typically a match of a node to a GPU. So you'll end up saying, 'I need 130,000 GPUs to train whatever these massive models.'" But the real challenge isn't just provisioning—it's the control plane complexity. "YAML or JSON is very verbose," Singh noted. "So you take 130,000 and you multiply that times 1k, 3K, whatever, you've now got this massive string of records that you have to store."

The storage and networking implications are staggering. At Backblaze, CEO Gleb Budman is seeing the data movement patterns firsthand: "Customers building models, customers doing inferencing and needing just a place to put a lot of data and be able to move that data quickly to wherever they need it to go." The traditional cloud storage economics don't work when you're moving petabytes for training runs.

The Platform Engineering Complexity Crisis

The most honest assessment came from Amit Govrin at Kubiya, who's watching platform engineers buckle under AI infrastructure demands: "Platform engineers oftentimes like to over-engineer and over-complex because everybody wants their own flavor... Everybody wants to host their own models. Everybody wants to run their own inference, everybody wants to run their own frameworks, and that gets everybody in trouble." The result? "You move the bottleneck upstream, so you never actually solve the engineering velocity problem."

At DTCC, Brian Cook's team has taken a radical approach—implementing a "zero-touch policy" where "no changes can be directly made to any environment that's not a lab." Cook's insight cuts to the core issue: "Manual, not my timeframe. And again, having a security background to say we are going to increase our velocity, give the customers what they need as we talk to them." But even with full automation, Cook admits the AI complexity is different: "We haven't gotten to tabletops yet... We're talking actual get somebody and break it."

The Economics Don't Add Up Yet

The infrastructure costs for AI workloads are fundamentally misaligned with traditional cloud economics. At Vultr, Kevin Cochrane highlighted their AMD partnership delivering "82% better performance per dollar than any leading compute plan alternative" specifically for AI workloads. But Aleks Shargorodskiy from AMD was more direct about the market reality: "12 months ago when I first started working in this GPU space, specifically on the neocloud side, mostly we were selling into neoclouds... But now, we've really expanded the way that we work together on working, not just selling in, but selling through and selling out."

The shift from selling hardware to cloud providers to selling through them signals a fundamental change in AI infrastructure economics. As Alois Reitbauer from Dynatrace observed: "AI is expensive if you run it, but also capacity, where can you actually run it? This is also still a scarce resource." The scarcity isn't just GPUs—it's the entire infrastructure stack capable of handling AI workloads at production scale.

Implications: Infrastructure as the AI Bottleneck

The 95% failure rate isn't about AI algorithms or model quality—it's about infrastructure teams hitting the wall of complexity, cost, and operational overhead. The companies succeeding are those with dedicated platform engineering teams, massive infrastructure budgets, and the operational discipline to treat AI workloads as fundamentally different from traditional applications. For everyone else, the infrastructure gap between AI experimentation and production deployment remains the primary constraint on AI adoption at enterprise scale.

C-Suite

5 citations

For executives making bet-the-company calls

C-Suite Imperative: Kubernetes Has Won the AI Infrastructure War

The verdict is in from KubeCon 2025: Kubernetes has emerged as the definitive platform for AI at scale, with enterprise adoption accelerating beyond traditional workloads. While only 5% of AI projects reach production today, those that do are overwhelmingly choosing Kubernetes as their runtime foundation. This isn't just about containers anymore—it's about building the operational backbone for the next decade of AI-driven business transformation.

Scale is the new competitive moat: Google's GKE now supports 130,000 nodes (doubling from 65,000 in just 12 months), primarily driven by massive AI training jobs. If your AI strategy can't scale to enterprise demands, you're building demos, not products.

Zero-touch operations are table stakes: Leading enterprises like DTCC are implementing "zero-touch" policies where no manual changes can be made to production environments. Everything runs as declarative code through GitOps, with automatic drift detection and remediation within 15 minutes.

AI inference is where the money flows: The focus has shifted from training to inference—the operational layer that makes AI real for end users. Companies are consolidating around Kubernetes-native inference platforms rather than managing fragmented AI toolchains.

Platform engineering owns AI's production future: The Python engineers building AI models can't operationalize them at enterprise scale. Platform teams are stepping up to bridge this gap, making Kubernetes the runtime for AI agents and workflows.

Decision Framework: Ask yourself three questions: Can your current infrastructure handle 10x AI workload growth? Do you have zero-touch deployment capabilities? Is your platform team equipped to productionize AI at scale? If any answer is no, your AI strategy needs immediate infrastructure investment.

The strategic implication is clear: Kubernetes isn't just winning the container orchestration game—it's becoming the foundational layer for AI-driven enterprises. Companies that treat this as a tactical technology decision rather than a strategic platform investment will find themselves unable to scale AI beyond proof-of-concepts.

Primary-source citations

Gari SinghProduct Manager, Google Cloud@ Google [GOOGL]✓ Verified

"You take 130,000 and you multiply that times 1k, 3K, whatever, you've now got this massive string of records that you have to store."

Eddie VillalbaOutbound Product Manager@ Google [GOOGL]✓ Verified

"We looked at the customer journey. What is that customer journey for this new kind of serving workload."

Akshay RamGroup Product Manager@ Google [GOOGL]✓ Verified

"With LLMs, you can have a request saying just like FAQ, how's the weather kind of request, which is pretty predictable. Or you can say, 'Summarize this document with a ginormous document.' So the load on the backend varies a lot."

Brian Cook@ DTCC✓ Verified

"No changes can be directly made to any environment that's not a lab... We write everything as code, right? So everything is declarative."

Dimitri VlachosChief Marketing Officer@ Spacelift✓ Verified

"ClickOps is great for the basic beginnings, but if you're really going to build a program, that's where IaC and GitOps come together."

Idit LevineFounder & CEO@ solo.io✓ Verified

"We announced K-Agent, which is basically taking this Kubernetes, and make it a runtime that you can run actually those agent, no matter which framework you're bringing."

Brian MonroeSenior Software Engineer@ NVIDIA [NVDA]✓ Verified

"We're definitely a container-first environment. So in every aspect of our R&D group, we're trying to get out of fixed infrastructure models."

Akshay RamGroup Product Manager@ Google [GOOGL]✓ Verified

"With LLMs, you can have a request saying just like FAQ, how's the weather kind of request, which is pretty predictable. Or you can say, 'Summarize this document with a ginormous document.' So the load on the backend varies a lot."

Gari SinghProduct Manager, Google Cloud@ Google [GOOGL]✓ Verified

"It's usually in the massive training jobs, massive AI jobs that need a lot of compute. There's typically a match of a node to a GPU. So you'll end up saying, 'I need 130,000 GPUs to train whatever these massive models.'"

Gleb BudmanCEO & Co-Founder@ Backblaze✓ Verified

"Customers building models, customers doing inferencing and needing just a place to put a lot of data and be able to move that data quickly to wherever they need it to go."

Amit GovrinCEO@ Kubiya✓ Verified

"Platform engineers oftentimes like to over-engineer and over-complex because everybody wants their own flavor... Everybody wants to host their own models. Everybody wants to run their own inference, everybody wants to run their own frameworks, and that gets everybody in trouble."

Brian CookDirector@ DTCC✓ Verified

"No changes can be directly made to any environment that's not a lab. Manual, not my timeframe. And again, having a security background to say we are going to increase our velocity, give the customers what they need as we talk to them."

Kevin CochraneCMO@ Vultr✓ Verified

"82% better performance per dollar than any leading compute plan alternative"

Alois ReitbauerChief Technology Strategist@ Dynatrace✓ Verified

"AI is expensive if you run it, but also capacity, where can you actually run it? This is also still a scarce resource."

Gari SinghProduct Manager, Google Cloud@ Google [GOOGL]✓ Verified

"130,000. It's usually in the massive training jobs, massive AI jobs that need a lot of compute. We don't want to take two years to start them up."

Brian Cook@ DTCC✓ Verified

"No changes can be directly made to any environment that's not a lab. The golden source of truth. The benefit we get is somebody makes a manual change, it doesn't match the golden source, it's automatically reverted within 15 minutes."

Shubhika TanejaSr. Product Marketing Manager, AI/ML@ Google [GOOGL]✓ Verified

"Inference is the next big thing, right? How do you basically serve billions of tokens at lightning speed, but not breaking the bank of your organization?"

Idit LevineFounder & CEO@ solo.io✓ Verified

"The Python engineers who's amazing, and writing this amazing AI stuff that I probably cannot write, do not know how to bring stuff to production, they don't know how to do security. We need to own that piece."

Alois ReitbauerChief Technology Strategist@ Dynatrace✓ Verified

"A lot of them are not in production today or don't even make it to that level of maturity. The main shift in the conversation is can you tell me that this actually has value, that this actually works for people?"