Shifting Dynamics in AI Computing Power: From Overheated GPU Resale Market to Strategic Investments in Intelligent Computing Centers| TMTPOST

中文

HOME

BRIEF NEWS

OPINION

FEATURES

LIVE

EVENTS

Chelsea_Sun ・ Nov. 22, 2024

Shifting Dynamics in AI Computing Power: From Overheated GPU Resale Market to Strategic Investments in Intelligent Computing Centers

Since the launch of ChatGPT by OpenAI about two years ago, the pace of LLMs’ development seems to have slowed, but this could be the calm before the next wave of growth. With faith in the "Scaling Law," companies like xAI, Meta, and OpenAI are actively planning for computing clusters on the scale of 100,000 GPUs or more.

AsianFin -- When large language models (LLMs) were all the rage in 2023, "scalpers" in their social circles would post messages like "Hurry and buy, 30% deposit, only serious buyers are welcome!" By 2024, the wording shifted to "Available for immediate purchase, high quality at great prices." Some have even quietly exited the market.

Take the price of the H100 system as an example: its official price is around $300,000, but the gray market price once surged to over 3 million yuan, offering more than 50% profit margin, which has attracted many buyers. However, the price has now dropped to around 2.3 million yuan, leaving little profit to be made through resale.

The reasons for this change include Nvidia’s chip upgrades, with new products like the Blackwell-based GB200 offering lower cost per unit of computing power, and the inevitable cooling down of the overheated computing power industry. Having GPUs doesn’t automatically translate into the computing power needed for large models, a reality that many are coming to terms with, often at a high cost.

LLMs typically require clusters of 64/128/256 servers (with each server equipped with 8 GPUs) for training. For manufacturers focused on foundational large models, clusters of tens of thousands of GPUs have become the entry-level standard. Both overseas giants like OpenAI and Musk’s xAI, and domestic players are all planning for tens of thousands of GPU clusters.

Pressure from demand is recalibrating the AI computing power industry, and at the forefront of this shift are intelligent computing centers. These centers, which integrate computing, storage, and networking, directly reflect the current state of large model computing power, and voices from the field are in agreement: there are too many "intelligent computing centers," but not enough computing power for large models.

While the lack of computing power is a real issue, so is the underutilization of resources.

Intelligent computing centers aren’t "too many" in an absolute sense, as the gap for computing power needed to train large models remains significant. The construction of large-scale intelligent computing centers will not slow down.

For instance, in July, billionaire entrepreneur Elon Musk announced that a supercluster in Memphis, Tennessee, began training with 100,000 Nvdia H100 GPUs, which he called "the world's most powerful AI training cluster." Two months later, Musk revealed that this cluster, named "Colossus," would increase its GPU count by another 100,000, with 50,000 of those being the more advanced Nvdia H200s. Training for Grok 3 is expected to be completed within three to four months, with the goal of launching by December.

At OpenAI, there have even been disagreements regarding the delivery of computing power and its "close partner," Microsoft. Previously, Microsoft and OpenAI collaborated on a massive data center project codenamed "Star Gate," which is expected to cost over $115 billion and house millions of GPUs. Reports suggest that by the end of 2025, Microsoft will supply OpenAI with about 300,000 of Nvidia's latest GB200 GPUs.

However, Sam Altman seems unsatisfied with Microsoft's speed. After raising $6.6 billion, OpenAI also struck a deal with Oracle, leasing servers in a new data center in Texas that will eventually house hundreds of thousands of Nvidia GPUs.

Chindata Group, an operator of large-scale data center solutions, told AsianFin that they are bullish on intelligent computing, predicting an explosion of demand by 2027, with all inference demands being met by large-scale data centers by 2030.

As of mid-2024, there are over 250 intelligent computing centers in China either completed or under construction. The number of tendering events related to these centers reached 791 in the first half of 2024, a year-over-year increase of 407.1%.

"This shows that the construction of intelligent computing centers has gained wide attention and support across the country. Since 2023, local governments have increased investment in these centers, driving the development of infrastructure," said Bai Runxuan, a senior analyst at CCID Consulting Artificial Intelligence and Big Data Research Center.

Wang Yanpeng, head of Baidu AI Computing Division, noted that from the demand side, the 100,000-GPU cluster is now the scale threshold for large model competition. From a technical perspective, the computing power needed for large models is roughly estimated by multiplying the model size by the required data volume. "GPT-4 has trillions of parameters, trained on a cluster of around 20,000 to 30,000 GPUs. According to the Scaling Law, the cluster for GPT-5 will likely need 100,000 GPUs, probably between 50,000 and 100,000, with parameter levels increasing by about 3 to 5 times."

However, alongside the growing demand for 100,000-GPU clusters, the LLMs market remains "quiet."

According to statistics from Economic Observer, as of October 9, 2024, the Cyberspace Administration of China had approved 188 generative AI models for service, but over 30% of these models have not disclosed further progress. Only about 10% of the models are still actively training, while nearly half have shifted focus to AI application development.

These signs suggest that the demand for pre-trained large models is becoming more concentrated.

Meanwhile, the domestic market is more complex than its overseas counterparts. Both markets share growing demand for computing power, but domestically there are challenges like computing power blockages, incomplete ecosystems, and behaviors like hoarding GPUs, leading to a paradoxical situation where computing power is both in short supply and underutilized. This is because "stuffing GPUs into data centers" and "building computing clusters for large model training" are two entirely different things.

There is no unified answer regarding the vacancy or waste rates in intelligent computing centers. Data from AsianFin shows that in the first half of the year, when 1.7 billion GPUs were deployed across intelligent computing centers in China, only 560 million GPUs were being utilized, resulting in a utilization rate of 32%. Other sources indicate that the average deployment rate for computing infrastructure in the industry is less than 60%.

Concerns over underutilization of computing power have caught the attention of various parties. "Many intelligent computing centers have been built, whether with domestic or Nvdia GPUs, but these clusters are experiencing varying degrees of underutilization. Governments have noticed this issue, and the operators of these centers have also seen losses. With the computing power challenge being difficult to solve in the short term, investment plans need to be carefully controlled," said an industry insider close to the government.

At the national level, over a dozen policies have been introduced to promote the construction of intelligent computing centers, such as "East Data, West Computing" and the "Digital China Construction Overall Layout Plan." However, the insider mentioned that the National Development and Reform Commission (NDRC) has recently made it clear that any new intelligent computing centers purchasing foreign GPUs will not receive energy consumption quotas. If purchasing domestic GPUs, however, there may be support for domestic innovation, and energy quotas can be allocated at key points in the national computing network to 'synergize East and West'.

Currently, the main investment models for intelligent computing centers include government-funded projects, independent investments by enterprises, and investments by universities or research institutions. Some centers have taken out loans from banks to purchase GPUs, with companies like Alibaba, Tencent, and Baidu guaranteeing the repayment.

Cloud providers have begun to negotiate with local governments, hoping to rent out unused computing power in intelligent computing centers. "We didn't know there were so many GPUs in China. In a sense, the scarcity of computing power is a result of resource misallocation," said the insider.

The government has also become aware of the probability for waste and is beginning to take steps to address it. Recently, the Ministry of Industry and Information Technology has issued approvals for pilot projects focused on intelligent cloud services to address the construction issues of local computing centers, particularly the waste caused by small, scattered centers built with government funding.

In recent months, the government has implemented several policies aimed at orderly guidance and the elimination of outdated capacity. For example, the "Data Center Green and Low-Carbon Development Action Plan" has set strict regulations on regional layouts, energy efficiency, water use, and the use of green electricity, while calling for the removal of high energy consumption price discounts for local areas. This policy is widely believed to accelerate the elimination of outdated capacity and improve the industry’s supply structure, fostering healthy industry development.

On August 1, the Regulation on Fair Competition Review came into effect, prohibiting local governments from offering tax incentives to specific businesses without legal or regulatory basis, which effectively halts the long-standing practice of local "tax incentives to attract businesses." This shift is expected to focus enterprises on their operations, moving the industry away from "price wars" toward "innovation wars."

The cloud computing industry has also taken note of the issues surrounding intelligent computing center construction. An Lin, director of Alibaba Cloud Intelligent Technology Research Center, said that when training large models, such as GPT-4 or GPT-5, many GPUs are used inefficiently, as they cannot process all training data simultaneously. An emphasized that future breakthroughs will come from optimizing training schedules, improving network systems, and more efficiently utilizing GPUs.

An further elaborated that the three main challenges for intelligent computing centers include cluster networks, task scheduling, and intelligent operation and maintenance. Wang Yanpeng also pointed out that building a 100,000-GPU cluster in China faces three major issues: cross-region deployment, mixed-core training, and cluster stability, all of which present multiple technical and engineering challenges.

Firstly, the network: LLMs have created a completely new demand for networks, which never existed before and, therefore, no mature solutions were available. All the current solutions are still being developed while in use. It can be said that network technology directly determines the scale of the cluster that can be built.

"With hundreds of gigabits of bandwidth, the bandwidth is fully occupied for forward model training within every millisecond, and in the next millisecond, it's fully occupied for backward communication. This kind of demand has never been encountered in human history in terms of communication. It involves many software and hardware factors, such as switches, network interface card (NIC) chips, software design, path selection algorithms, and communication protocol acceleration. To accomplish this, NICs, switches, and even optical cables used in between must be custom-designed," said An, who also mentioned that Alibaba Cloud's AI high-performance network architecture, HPN 7.0, was included in the SIGCOMM 2024 proceedings, becoming the first paper on AI computing cluster network architecture in SIGCOMM's history.

Next is task scheduling: A small-scale computing cluster has a simple network, but it lacks competitiveness in terms of efficiency and scale. The key is how to make computational tasks flexible in hardware resource scheduling to achieve higher asset utilization and lower computational costs.

The traditional approach to scheduling is based on hardware resources: first, monitor whether the computing card is idle, and if it is, assign it a task. This is the simplest and least efficient scheduling method. The cloud computing industry has long evolved to schedule based on tasks, where the progress of each task on every card can be monitored and new tasks are allocated based on their progress.

An emphasized, "It's not just about assigning tasks to computing cards; it's about scheduling smaller and more detailed tasks across these cards, which requires significant engineering technical capabilities. This is why AI companies that excel globally are generally cloud computing companies."

Lastly, operations and maintenance: In the past, if a computing card broke down, it could be quickly isolated, and other cards could continue running. Now, large models often experience instantaneous failures, with fluctuations on a millisecond level. Any jitter or packet loss during a communication process can cause GPU utilization to drop by 50%. An mentioned that Alibaba Cloud has upgraded to millisecond-level detection to promptly isolate faulty computing power from the cluster.

In addition, domestic companies face a practical difficulty when constructing computing clusters: chips.

Chinese companies are facing challenges with unstable computing power supply, making it difficult to build a single large-scale training cluster. In practice, companies often have chips from different generations of the same manufacturer or chips from different manufacturers coexisting. How to carry out mixed-chip training while ensuring efficiency is also a major issue.

Moreover, as the integration level of chips continues to increase, their failure rate also rises. Nvidia's H-series chips have a failure rate 3-4 times higher than the A-series. Furthermore, the larger the scale of the computing power cluster, the higher the failure rate. According to the failure rate of H-series chips, a 100,000-card cluster would experience a failure every 20 minutes. A higher failure rate puts greater demands on stable training support.

Wang introduced that domestic companies, including Baidu, are addressing these issues. For cross-region deployment, due to high latency caused by longer transmission distances, the Baike 4.0 system has built a super-large-scale HPN (High-Performance Network) for 100,000 cards. It uses more efficient topologies, better multi-path load balancing strategies, and communication protocols, enabling communication across distances of tens of kilometers. In terms of communication efficiency, optimized congestion control algorithms and collective communication strategies have raised bandwidth utilization to 95%, achieving completely non-blocking communication. Lastly, ultra-high-precision network monitoring with 10ms accuracy ensures network stability.

Regarding whether the construction of intelligent computing centers is overly premature, there are differing opinions. One side believes that domestic intelligent computing centers cannot yet break free from overseas ecosystems and will require a three- to five-year transition period. During this period, large-scale accelerated construction of computing centers will inevitably lead to substantial waste.

On the other hand, some argue that the blockade from abroad will only intensify, and the domestic computing power ecosystem must mature more rapidly. Compared to the strategic competition at the national level, some small issues arising from premature construction are acceptable. Reports indicate that, at the request of the U.S. government, TSMC has been forced to temporarily halt the supply of 7nm and below advanced chips for AI computing power customers in mainland China.

Currently, hoarding Nvidia cards indeed leads to some waste of computing power. As mentioned, many buyers lack the necessary network, scheduling, and maintenance capabilities for intelligent computing centers. A technical expert at an intelligent computing center said frankly, "There has been too much speculation; many people who are not even in this industry thought hoarding goods would make money and stuffed them into a data center. They couldn’t solve the issues of stability, fault tolerance, and various other problems, resulting in significant waste."

Domestic computing power also faces challenges. The expert expressed concerns about waste in domestic AI computing power, commenting, "Huawei's operational capabilities are too strong. While everyone wasn't ready to use domestic chips or Huawei's chips, a lot of effort was spent building computing power fields and intelligent computing centers. Operators built large-scale clusters of tens of thousands of cards. The chips were not yet ready for customer use, and there is still some distance before they can be fully utilized. This issue will amplify as more domestic chips enter the market."

However, the expert remains optimistic about the overall situation of domestic cards, saying, "In the era of large models, the computing power landscape is changing. Earlier models were very scattered, and the CUDA ecosystem was strong because it needed to be compatible with so many models. Now, with large models becoming more consolidated, everyone is using the same mainstream frameworks. At the same time, with Nvdia being so expensive and considering the issue of computing power availability, people will be more willing to try domestic cards."

Recently, Science and Technology Daily also published an article by Zhang Yunquan, a member of the National Committee of the Chinese People's Political Consultative Conference. The article emphasizes that the construction of intelligent computing centers requires significant investment, with uncertain returns.

The article said that because intelligent computing technology evolves quickly, the lifespan of computing centers is generally only five to ten years. Without strong technological reserves and upgrade capabilities, they may end up in a situation where continuous investments fail to keep up with technological progress. Moreover, the operation and management of computing centers require professional technical personnel and efficient management teams; otherwise, the centers may not function as intended, potentially leading to idle equipment and resource waste. Therefore, decisions on whether to build, when to build, and where to build intelligent computing centers must be made with scientific caution and careful planning, rather than rushing to "follow the trend." The general principle should be to build according to local needs, considering sustainable market demand, and with appropriate foresight.

Some regions have also strengthened the operational requirements for intelligent computing centers. For example, the "Dezhi Future" intelligent computing center project in Dezhou, Shandong, with a value of approximately 200 million yuan, clearly specifies in the bidding documents that the construction should adopt an integrated model of design, procurement, construction, and operation, with an operational period of no less than five years, and establishes minimum revenue requirements for computing power after the project is put into operation.

Wang Wei also noted that, from a policy perspective, the government’s requirements for intelligent computing centers are higher than before. Previously, it was sufficient to simply build the centers. Now, at the early stages of construction, it is essential to identify good operators or integrate construction and operation to ensure the utilization rate of computing power.

"Last year, computing power consumption was mainly for training. Currently, it seems that the computing power at computing centers cannot be fully utilized. Many large model manufacturers are hoarding computing power, and some have reduced pre-training and no longer need to rent large amounts of computing power externally. Now, many computing centers are starting to focus on inference scenarios and practical applications, where the usage will be more distributed, and the market will likely become healthier," he said.

About TMTPOST

Join Us

Contribute

Subscribe To Our News