Podcast: Data Center 2.0—Untangling the software stack

Software is what makes a modern data center tick. But sorting through what goes where in terms of IT, OT and beyond can be tricky. We untangle this complex web and shine a spotlight on how AI is driving change in the systems that keep compute running smoothly. 

This is Data Center 2.0, The Five Nine miniseries digging into what's changing in the infrastructure behind modern businesses. 

Catch the video at top, listen to the audio edition and read our transcript below, or watch this and future episodes on YouTube

 

To learn more about the topics in this episode, check out: 

This podcast is written and hosted by Diana Goovaerts. It is edited by Diana Goovaerts and Matt Rickman. Liz Coyne is our executive producer. Special thanks to guests Steven Dickens, Steven Carlini and Jamie Thomas.


Diana Goovaerts, Fierce Network: When you stream a movie post on social media, use an AI chat bot or load a website you are tapping into a massive, invisible machine. But that machine isn't just hardware. It is a carefully layered stack of software all working together in concert.

Steven Carlini, Schneider Electric: Software is kind of the intelligence backbone of data centers and AI factories. So, software is key to really keeping up with the fast paced change happening in accelerated AI compute.

Steven Dickens, HyperFRAME Research: But we are throwing the stack up in the air. For me, this is as much of a tectonic shift as the internet was in '99, or mobile was in 2007, you know, or the cloud has been over the last 20 years. This is one of those big tectonic shifts.

Diana Goovaerts: Welcome to The Five Nine miniseries Data Center 2.0, the show where we break down what's changing in the infrastructure that powers modern business. I'm Diana Goovaerts, and today we are unpacking what the data center software stack really is and why AI workloads are forcing a rewrite from power to cooling all the way up to applications and services.

Software is what makes a modern data center tick. It designs it, monitors it, secures it and increasingly it has to optimize it in real time as AI pushes rack density and power levels to extremes. But the phrase "data center software" can be a little bit confusing because it blends two worlds: facility software, so power cooling building systems, and IT software, so servers, networks, virtualization, containers, observability and security.

Steven Dickens: So what makes up the software stack is really a lot of things. It's those element managers, as I call them. There's gonna be a power system in there. There's gonna be water systems. There's gonna be physical things, electricity, HVAC moving air through the data center. So that's kind of where the Schneider Electrics, the Honeywells, you know, that kind of layer. 

Then once you get up past that kind of infrastructure element level, then you start to get up into things that we would recognize from IT. Vendors like Cisco, like HPE, like Dell, like Lenovo, like SuperMicro, that is where you're kind of managing servers. You know, a classic example would be from somebody like HPE with their iLO and compute ops management. This is pretty low level: is the box on, is it working, is it connected? Is the link working? You know, that kind of level.

Then you start to move up into looking at the workload itself. So that might be at the container level, it might be the virtual machine level. It's looking at traces, logs, metrics as kind of where the observability vendors start to come in, the Splunks, the Dynatrace, the Datadog, the New Relics of this world.

So kind of as you go up through the stack. You're trying to monitor and observe what's going on, whether that's from the real physical infrastructure in the data center right up to the workload, and then you start to get to the application level. So application performance monitoring is the, um, supply chain up and running. You can't answer that question unless all the parts below it are being monitored and observed.

Diana Goovaerts: On the facilities side, Carlini uses more OT flavored terms and emphasizes separate silos, so power monitoring, cooling and BMS, plus the growing need to unify them.

Steven Carlini: You know, the data center software stack is really complicated. When data centers were smaller, you really just had an IT room that you were monitoring. And there's software that's dedicated to that, which is usually decent software. On the power side, you have power operations, a power monitoring software that monitors all the way from the utility, all, all the way into the IT room. And then you have the cooling software, which is usually part of the building management software. 

You have kind of these silos, you know, with these software that are operationally very, very good at doing the different domains. And that's where the big challenge is because all of these domains are interdependent on each other. We wanna start moving the software to be more unified across all of the domains.

Diana Goovaerts: So to recap, here is the practical model. Facilities plus IT infrastructure plus workload platform and observability, security and application outcomes. Got it? The hard part is that each layer usually comes with its own tools and integration tends to break at the seams.

Steven Dickens: It's a bunch of screens, it's a bunch of interfaces to manage this stuff. There isn't one sort of ring to rule them all. You are gonna be in Schneider's tools. You're gonna be in Honeywell's tools. You're gonna be in this vendor's tools. You're gonna be in Cisco's tools. You're gonna be in HPE's tools. We are still a long way away from a single pane of glass where you can manage at a data center level.

Steven Carlini: We're trying to get all of the data from the different domains and we're encouraging companies and giving them the tools to put all the data in central repositories or a single location because when you have the data from the different domains and different places, it's really hard, you know, to run analysis on that data.

So the first step is to try to get it into a single location. Things have really I think stepped up in the last few months as far as advancement. And a lot of this is being driven by the fact that the densities, especially in the AI factories with accelerated compute are, are almost doubling every year with the latest evolutions in generation from the Nvidia GPUs. 

The higher the density, the lower room for failure because things will heat up very, very fast. If you have a hiccup in your CDU or if you have some water pipes have a leak or something, you're gonna overheat in a very, very short period of time. So the importance of management in these types of environments has never been more critical.

Diana Goovaerts: And when the physical side gets tighter, the software side has to get smarter, more autonomous, more security aware, and better at understanding what's normal and what's not. The future stack isn't just about more dashboards, it's about more automation, more prediction and more control loops, because humans can't manually steer this ship anymore.

Steven Dickens: The way that AI's impacting the data center is pretty foundational. We were on a really good track with PUE, sort of a measure around data center performance. So if you go back three or four years, we were starting to crack that problem. We were starting to get really efficient with how data centers moved their hot and cold aisles. Then somebody puts GPUs in every server and we're back to pretty foundational challenges. 

Again, these consume a lot of power, consume a lot, and push out a lot of heat.

So that whole kind of trend we were on has been knocked back and we're now scrambling. You know, so much so that if you look at Dell Technology World com last year, Michael Dell was announcing on stage as part of his keynote a rear door heat exchanger. That's pretty nerdy stuff. But it was that important that it made it onto the keynote. 

This is front of mind, and what's happening is all the smart people in the industry are having to look at all the layers in this stack to rearchitect it because we've thrown a new component in there that's fundamentally changed the dynamic.

Jamie Thomas, IBM: I would say that the software has to become a lot more intelligent. It has to become intelligent in terms of being a bit more autonomous. Certainly in terms of dealing with one of the key constraints of using AI at scale, which is security.

Diana Goovaerts: But beyond the IT/OT stack that we've been talking about there is the software services stack, and that's where companies like neoclouds are looking to differentiate themselves from hyperscalers.

Neoclouds originally pitched themselves as bare metal providers, experts at running the very systems we just discussed and the GPUs, CPUs and storage systems we talked about in previous episodes, but that may not be enough anymore.

Steven Dickens: Just being a pool of GPU capacity and renting GPU capacity, that has got a shelf life. And I'd argue that shelf life expires at the end of this year. It doesn't cut off at the end of December, but it's largely a 2026 thing. 

I think the race for the Nebiuses and the CoreWeaves and the Lambda Labs and those guys is they've gotta race this year and put their foot flat to the floor to get away from just being seen as renting capacity.

How do they do that? It's being able to use the GPU as an anchor point for those new AI-driven applications. They've gotta make the pivot from training to inference capacity. If the neoclouds can become the place where you run inference, there's a huge opportunity.

Diana Goovaerts: Jamie Thomas argues that the ceiling for neoclouds is often economic. If a provider can offer a compelling cost equation and real value, well then the market will let 'em keep climbing.

Jamie Thomas: It will all be around economics. So if they're able to offer an economic equation to the client that is compelling, I don't know that there's any limitation in where they can go.

There is a lot of scale that's being achieved by the main cloud vendors to allow them to do things at the right cost dimension because of their scale that I think will be a reality of the competition. But if someone comes up with a different innovative approach and it's cost effective and it brings value, I think there's opportunity there.

Diana Goovaerts: Before we wrap, there are a few misconceptions worth retiring. First, is the idea that the stack is somehow solved. Steven from HyperFRAME says the opposite is actually true. We are in the middle of a tectonic shift and anyone claiming that they've got it all figured out is selling you comfort.

Steven Carlini: The IT and the OT kind of worlds have their own management platforms and they'll never emerge. And I think we're starting to see that the industry has to see that software evolve to be able to not only do what its core function is, but to extend from just power to power and cooling or from just cooling to cooling and power and then IT room, and then as you talk about moving up the stack. 

So I think the myth 'you need dedicated software for each domain,' I think that myth is gonna start to be dispelled as more and more of the software starts to be more interoperable.

Steven Dickens: We are still so, so, so early. People think we've been going at this now for three years and that we've got it all figured out. We are in the first innings of a seven game world series. You know, if you think about the maturity of the public cloud 20 years after it was launched, that is probably a good rubric to think about where we are on the maturity curve of AI. So, three years into a 20-year journey. That's about right, I think.

Diana Goovaerts: Second, this shift isn't just technical, it's operational, it's a human. And Jamie's hot take is that the workforce has to get proactive and AI literate to help run what's coming.

Jamie Thomas: In terms of where things are headed from a data center software [perspective], one of my real hot buttons is with the workforce. So in other words, we've created a workforce in many cases in the IT in industry, that it tends to be a bit more reactive. And now we need to create the workforce of the future.

The workforce of the future needs to be a much more proactive workforce, and to do that, they need to be AI literate. So I'm a really strong proponent that we need to create an AI literate IT workforce so that we can then support the evolution of the technology that we're all supporting.

Diana Goovaerts: In a nutshell, the data center software stack is becoming less like a set of tools and more like a coordinated system spanning design, operations and outcomes. And AI is accelerating the timeline for getting that coordination right.

That is all for today. Make sure you like and subscribe wherever you listen or watch to catch the latest episodes in our series. And until then, we will see you next time.