Cloud

Akamai's AI orchestrator puts inference where latency matters

By Mitch Wagner Apr 6, 2026 7:00am

Akamai Nvidia Cloud

Akamai network operations center with curved workstations and video wall displaying global traffic and cybersecurity data — Akamai's network operations center (NOC). The company's AI orchestration, based on Nvidia AI Grid, delivers reduced latency for inference applications such as speech recognition, e-commerce personalization and video intelligence. (Akamai)

Akamai released a global implementation of Nvidia's AI grid reference design
The operator's orchestrator routes inference requests by intent, latency, cost and data sovereignty, with semantic caching
The technology helps preserve premium GPU cycles

Real-time AI has a latency problem. Applications such as speech recognition, e-commerce personalization and video intelligence require fast responses that data center-centric AI struggles to deliver.

"We're really at this inflection point where inference is becoming the dominant focus for AI," Jon Alexander, Akamai SVP of cloud products, told Fierce at the Nvidia GTC conference in mid-March. "Models have been trained. Now what we're seeing is our customers are adopting those models, bringing them into their experiences — and as you use those models, that's inference."

Speech recognition that takes five seconds is too long to feel natural. Personalization engines that need a round trip to a distant data center can't keep pace with a customer mid-session. And video intelligence processed in the cloud rather than near the camera misses moments that can't be recovered. These are the core use cases driving the next wave of enterprise AI adoption — and they all demand inference close to the user.

Meet Akamai's AI Grid orchestrator

To help solve the inference problem, Akamai unveiled its implementation of Nvidia's AI Grid reference design, an intelligent orchestrator that acts as a real-time broker for AI requests, optimizing cost per token, time-to-first-token and throughput (which Akamai calls "tokenomics" — cute, right?). The AI Grid is an important part of Nvidia's pivot to inference. Other companies have their own implementations of AI Grid. But Akamai claims its differentiator is that its implementation is global.

Notably, Cisco is partnering with Comcast to use AI Grid for advertising, gaming and an enterprise concierge to handle phone calls, appointments and questions. Cisco is also partnering with AT&T to implement AI Grid to provide video security at the Dallas Discovery District as well as with TanMar Companies, an industrial services firm in Louisiana. And Spectrum is deploying AI Grid to enable movie animation.

The AI Grid orchestrator routes by intent, Alexander explained. A complex reasoning task gets sent to a large model with the compute depth to handle it. A speech-to-text request goes to the nearest GPU running the right model, with latency, cost, data sovereignty requirements and model availability all factored in.

Model affinity provides efficiency: once a model is loaded into GPU memory, routing subsequent requests to the same node amortizes the reload penalty across many sessions, improving both latency and utilization, Alexander said. Semantic caching is applied as well, reserving premium GPU cycles only for workloads that demand them.

Akamai puts its foot (print) down

Akamai faces a tough competitive environment, with hyperscalers, alt-scalers, neoclouds and telcos all focused on inference at the edge. However, Akamai's existing infrastructure is a strength. The company currently has GPUs deployed across 20 data center locations and plans to reach roughly 100 over the next 12 months — enough, Alexander said, to deliver 10-20 milliseconds of latency for the real-time workloads it is targeting.

Akamai has the colocation relationships to get there. "We have 4,400 locations globally," Alexander said. "We have very deep relationships. We have space to grow into." GPU supply and data center space are both severely constrained right now; Akamai's two decades of infrastructure partnerships are a practical advantage, Alexander said.

Akamai is also deliberate about scale, focusing on the midmarket. "We don't believe that every GPU should be deployed in gigawatt data centers," Alexander said. A previously reported $200 million, four-year contract with an unnamed U.S. tech company was anchored by a 16-megawatt facility, sized for inference, not training.

Who's buying?

As Fierce reported when Akamai launched its Inference Cloud last fall, the company is chasing an edge AI market expected to reach $157 billion by 2030. Akamai sees inference as a third leg for its business, alongside its traditional content delivery network (CDN) and security lines.

Akamai's AI orchestrator puts inference where latency matters

Meet Akamai's AI Grid orchestrator

Related

Related

Akamai puts its foot (print) down

Who's buying?

Related