What a post-AI bubble world would look like for the server GPU market

Sam Altman has been very good for Nvidia.

In 2022, server-class GPUs were a $10 billion business according to Aaron Rakers, an analyst at Wells Fargo. Not bad, but still a small category nonetheless. This year, revenues are expected to hit the $50 billion mark, and it’s all thanks to the generative AI craze spawned by ChatGPT.

Rakers estimates that if these trends persist, the server GPU market could be worth more than $100 billion by 2028—or, put in other terms, equivalent to the combined GDPs of Iceland and Lithuania. Nvidia, with a 95 percent market share of the server GPU market, and a growing focus on accelerated computing as a whole, will be the prime beneficiary of this spending spree.

But that’s a big “if.” Sure, OpenAI may be the hottest name in tech, recently closing a $6.6 billion funding round at a $157 billion valuation. Yes, hyperscalers like Microsoft and Google are spending big, hoping to provide the computing power necessary to power generative AI. And I won’t deny that generative AI features are creeping into more and more products.

But that doesn’t mean that generative AI will be a generational, industry-defining technology. What, then, happens to the server GPU market if the generative AI bubble pops?

If the market for generative AI turns out to be a transient fad, it would undoubtedly be bad news for the hyperscale cloud providers that have spent big on new data centers, servers, and GPUs. Nvidia would also, undoubtedly, suffer.

But I’m convinced that any pain would be, if not transient, then limited. Tech companies are resourceful creatures, and I believe they’ll pivot. There are countless applications for server-class GPUs beyond generative AI, and many of these have yet to be fully realized.

Parallel Potentials

Until fairly recently, GPUs were mainly used by gamers to increase frame rates and improve picture quality. This was the GPU market. Companies like Nvidia and AMD (or, going further back, 3dfx, ATI, and PowerVR) made their money by helping gamers play the latest titles, or by selling the chipsets that powered the latest consoles of the era.

GPUs became essential to gaming because they were better at parallel processing than CPUs. They could perform more simultaneous calculations than even the most expensive Intel silicon—which is incredibly useful when you’re simulating real-world physics, or trying to render thousands of pixels every millisecond.

To illustrate that point: the most capable Intel Xeon CPU has 128 cores. An Nvidia H100 GPU—the kind used to power generative AI applications—has nearly 17,000.

Think of cores like workers. A CPU might have 64 really fast workers, but for tasks that lend well to parallelization—those where you can divide the task across multiple “workers”—the GPU wins. Even if its workers are slightly slower, the fact that there are more of them means it gets the job done quicker.

Over the past 15 years, AMD and Nvidia—the two major GPU manufacturers, although Nvidia dominates the server market—have built the foundations that allow developers to use these cores, not for rendering the latest Call of Duty title, but for running general purpose applications. NVIDIA had one particular leg up—it created CUDA in 2007, an API which allowed software developers to create applications accelerated by GPUs.

And that matters because CPUs are different beasts to GPUs. You can’t just write normal code and expect it to run on a GPU with full acceleration. It needs to be delicately tuned for the underlying hardware. CUDA dramatically simplified this process, making it accessible to everyone.

As a result, NVIDIA’s transformation is especially stark. It’s no longer a gaming company, or a hardware company (although its actual manufacturing is primarily contracted out to other vendors). Nvidia is an accelerated systems company, having constructed a mature software ecosystem that allows companies to run the most demanding computational tasks on its GPUs.

Running and training AI is both computationally-intensive and involves processing large reams of data, and so it makes sense that server-class GPUs are a hot commodity. But, if you think a little further, it’s not hard to identify other potential use-cases.

And, when you consider the scale of these use-cases, it’s enough to give you hope for the future of the segment.

Looking Forward by Looking Back

Many of these use-cases are well established. The most obvious example is, of course, data analytics.

It’s hard to fathom petabyte-scale datasets. But if you’re a large government entity, running services for tens (or hundreds) of millions of people, or a large company serving a global market, it’s your reality. These organizations face a distinct challenge: more data requires more computational power to process it.

Consider a company like Amazon, for example. It has hundreds of millions of customers around the world. With every interaction, each customer generates new data to be analyzed. I’m not just talking about individual purchases, but also the various telemetry generated by analytics tools, and countless other systems I’m not privy to.

Brands retain this data because it has value. It’s what provides the insights behind product recommendations, which, in turn, drives sales. The sooner you generate these insights, the greater likelihood your recommendations will be relevant to the customer. And so, you need a GPU.

Or, rather, lots of GPUs.

Another obvious example is high-performance computing. Throughout computing history, governments and research institutions have constructed supercomputers—typically using clusters of powerful CPUs—to perform critical calculations. These machines have been used to predict the weather, find cures for diseases, map the human genome, and develop new weapons systems.

One good example is HPE’s Frontier supercomputer—the first system to pass the exascale barrier, and, until recently, the world’s most power-efficient supercomputer. This machine, which uses thousands of AMD GPUs, isn’t powering ChatGPT, but rather scientific research.

While high-performance and scientific computing sounds like a small niche, it isn’t. This is evident by looking at Nvidia’s own shipment figures. In 2017, the company claimed that half of its server-class GPUs went to organizations that weren’t hyperscale cloud providers. While it’s plausible that smaller cloud providers accounted for much of this portion, it’s also likely that many also went to individual institutions—a safe bet, considering Nvidia’s long-standing relationship with the scientific computing community.

Finally, even if generative AI is a passing phenomenon (or is confined to just a few specific areas, rather than becoming central to the knowledge economy), AI itself isn’t going away. The same AI systems that power computer security applications, spam filters, and our social media timelines will still exist, and they’ll need computing power to operate.

A Post Generative AI Future

Should the generative AI bubble pop—which, to be clear, isn’t a certainty, and something Nvidia is actively working to avoid—it would undoubtedly be painful for Nvidia, as well as the hyperscale cloud providers that have invested billions in new infrastructure. But it would also present a valuable opportunity for these companies too, and one that might mitigate any harm.

I’ve spent much of my career helping companies use GPUs to accelerate their workloads. From my own experience, the biggest barrier is the perception that writing code for GPUs is hard, or that the payoff doesn’t justify the effort.

Admittedly, Nvidia has worked hard to reverse that perception—and its efforts, particularly when it comes to its software ecosystem, has been crucial to popularizing GPU-based computing. In a post-generative AI world, hyperscalers like Microsoft, Amazon, Oracle and Google will undoubtedly feel motivated to help further.

And I believe they can tangibly help. By running GPUs in the cloud, you eliminate any upfront investment—which may entice smaller startups and cash-strapped research institutions. The scalability of a cloud platform—where you can increase or decrease the resources you use, based on your needs—will undoubtedly be another major driving factor.

You could also make an environmental case for GPU computing. While GPUs may draw more power than a CPU, they’re also more capable—especially when given a task that allows for parallelization. In practice, this means companies can perform the same task, but while using fewer servers—which translates into significant reductions in energy consumption.

Generative AI may prove to be the transformational technology we’ve been promised. However, given the uncertainty, it would be prudent for Nvidia and the cloud giants to start evangelizing the benefits of GPU computing to non-generative AI customers. At worst, it’s an insurance policy. At best, it’s a valuable customer segment.

And they can do this by expanding the software ecosystem, building products that focus on common business challenges where GPUs can help, and developing developer tools that lower the barrier to entry.

These companies could also redirect a portion of their marketing war chest to popularizing GPU computing. This is a messaging problem as much as a technical problem, and we need to make the case that GPU computing is here, it’s accessible, and the benefits are real.

Ultimately, I believe server GPU market would survive the pop of the generative AI bubble. Server GPUs existed before generative AI, and they’ll continue to exist after it.

How painful that transition will be, and whether we sustain the current pace of innovation, however, is entirely up to Nvidia and the cloud giants.

No comments

Read more