TMTPOST -- Today’s good models are affordable enough for AI-first applications to flourish in 2025, which marks a shift from foundation models to applications and the biggest event in AI next year, Kai-fu Lee, a 40-year veteran in AI research and the founder of Sino Ventures and 01.AI, noted on Saturday at the T-EDGE Conference.
Over the past year and a half, model capabilities have significantly improved, while costs have plummeted, with inference costs dropping approximately tenfold in one year. However, 01.AI aspires to achieve three to four times this rate of improvement—accelerating speed and cost reductions by 30-40 times the current pace—to outpace the industry and rapidly foster innovative applications, Lee said.
A year and a half ago, GPT-4 was the only available large model on the market, and GPT-3.5 could actually support very few application scenarios because its model capabilities were not strong enough. But if application developers had chosen to integrate GPT-4, each call would have cost $1-$2. “Who could afford to develop applications under those circumstances? It would have been easy to go bankrupt,” Lee said.
Lee noted that AI large model companies should focus on serving customers and co-creating value with them to achieve a win-win. Regarding AGI, he pointed out that one day, AI will be able to do more than humans, but it does not necessarily have to do everything that humans can do. He predicted that it may take seven years to achieve AGI.
"We firmly refuse to blindly burn money on unprofitable traffic or engage in loss-making ventures for mere publicity," Lee said. "01.AI aims to build the fastest and most cost-effective models with exceptional performance, igniting the ecosystem of large models for both ToC (consumer) and ToB (business) applications," he added.
While scaling is a central theme, 01.AI shifts the focus from training time compute to inference time compute, achieving a complementary synergy that pushes the boundaries of model capabilities more efficiently, Lee pointed out.
Scaling laws remain relevant, but its efficiency has diminished for two primary reasons: the limitation of data, as the growth of text data is no longer as rapid as before, and the declining efficiency of adding more GPUs. The relationship between GPU quantity and training gains is no longer linear, Lee further explained.
To create faster and cheaper models, the core principle is optimizing memory usage and minimizing GPU reliance—storing when possible and calculating only when necessary. For Chinese AI startups, the focus must be on "effective innovation." Companies should avoid overly ambitious AGI experiments that risk high costs and limited practical application scenarios, Lee suggested.
Looking ahead to next year, global inference compute costs are expected to drop further, potentially driving the explosive growth of AI-first ToC applications. These applications, which require time to build user bases before monetization, could find new growth opportunities, he shared.
China’s AI 2.0 future holds key advantages: the ability to develop cost-effective models with extremely low inference costs, which are foundational for high-DAU (daily active user) applications. Additionally, Chinese teams possess strategies honed during the mobile internet era that can be leveraged to promote and scale AI applications. Together, these factors make Chinese teams well-positioned for success in the ToC space.
However, the industry faces a critical challenge: large-model startups must now prove their ability to achieve sustained revenue growth. As technical competition transitions to commercial competition, startups must evolve from academically driven ventures to entrepreneurially managed enterprises. Failing to make this shift will narrow their path forward.
01.AI positions itself with two key commitments: first, building the fastest and most cost-effective world-class models to ignite ToC and ToB ecosystems; second, refusing to blindly burn money on unprofitable traffic or engage in loss-making ventures for publicity.
The following is the full transcript of the dialogue between TMTPost CEO and T-EDGE Global Committee Chairperson Zhao and Lee, translated and edited for clarity and brevity:
Zhao: Dr. Lee, welcome to the 2024 T-EDGE Conference and TMT Annual Economic Meeting. We are thrilled to have you here for this conversation.
Lee: Hello, Hejuan. Thank you for the invitation, and greetings to all the audience members.
On the Development of Reasoning Models: O1 Resembles Engineering Disciplines and Needs to Coexist with Foundational Models.
Zhao: Over the past year, the AI industry has seen significant events globally, whether in Silicon Valley, the United States, or China. Recently, in your observation of the major trends in AI development in Silicon Valley and worldwide, what do you think will be the most notable change in the coming year?
Lee: The most significant change I foresee is that more developers will realize that today's advanced models are becoming affordable, heralding an era of AI-first applications flourishing across various industries. I believe this will be the biggest event of 2025.
Previously, it was difficult to develop such applications. Imagine that a year and a half ago, only GPT-4 was sufficiently powerful. Looking back now, GPT-3.5 was suitable for very few scenarios due to its limited capabilities. At that time, deploying GPT-4 for an application would have cost $1-2 per API call, making it prohibitively expensive and potentially leading developers to financial ruin.
Over the past year and a half, model performance has improved significantly, from GPT-4 to Turbo, to 4o, and now o1. Simultaneously, costs have dropped substantially. For instance, the cost of GPT-4o has fallen to $4.4 per million tokens (based on a 3:1 input-output ratio), a 20-fold reduction compared to GPT-3.5 over a year ago.
Across the industry, inference costs have dropped by about 10 times annually. The entire field is advancing inference efficiency and cost reduction at a rate of tenfold per year. However, at 01.AI Ten Thousand Things (the company), we aim to achieve a 30 to 40 times improvement—three to four times faster than the industry average. This would allow us to pioneer exceptional applications ahead of others.
Zhao: Cost reductions will lead to an explosion of applications, which is your major prediction for 2025. You also mentioned that many models have undergone iterations this year. Recently, during a discussion on the o1 large model platform at our Silicon Valley office, we invited team leaders from x.AI, OpenAI, and others. We noticed that all major companies are focusing on reasoning models like o1. Compared to foundational models, o1 represents a significant shift in paradigm and direction. Can we interpret this as a future where competition shifts from foundational models to reasoning models? Has the foundational model competition reached its peak? Is 01.AI also considering introducing new reasoning models?
Lee: Everyone is working on reasoning models, which is undoubtedly a trend. 01.AI is among the few Chinese companies that have made good initial progress in this area.
From a technical perspective, o1 is both an extension of large models and an inspiration for alternative approaches. While o1 is still scaling, the focus has shifted from training-time compute to inference-time compute. Together, these approaches complement each other in breaking the limits of model capabilities.
It’s akin to human “fast thinking” and “slow thinking,” both of which are extensions of brain functions. When combined, they allow for deeper thought processes. Similarly, in the domain of large models, reasoning and foundational models complement each other. However, earlier, most exploration focused on "fast thinking," resembling quick question-and-answer responses.
Of course, “fast thinking,” like earlier models, can still meet many creative and literary needs. Yet, when faced with complex challenges, intuition-based answers are often inadequate. Humans possess a strong ability for self-reflection and revision, known as "reflection." This critical thinking and iterative process are essential for scientific breakthroughs.
o1’s development aligns with this trajectory and has already demonstrated its value externally. Many in the industry are shifting from massive pre-trained models to reasoning explorations. This dual scaling—making models larger and smarter while enabling deeper reasoning—provides a pathway toward AGI. These two approaches can synergistically amplify each other, achieving results greater than the sum of their parts.
Zhao: Does this mean the Scaling Law for foundational models, based on Transformer architectures, is no longer valid? Have we reached a bottleneck where more compute power won’t help?
Lee: The Scaling Law is still valid, but its efficiency has decreased.
First reason: The total amount of data in the world is limited. Although we can use machines to generate data and train using video data, multimodal data, or embodied intelligence data, the most concentrated form of intelligence still comes from text. The total amount of text is just that much, and the results of other methods are not perfect. As you mentioned, all of humanity's textual data will be utilized, but the growth of textual data won't be that fast.
Second reason: To continuously advance the scaling law of large model pretraining, it is necessary to link more and more GPUs and machines together. When you only have one or two GPUs for deep learning Transformer calculations, the pretraining time for large models is almost entirely spent on computation, with very little involving transmission. However, when you have 100,000 or 200,000 GPUs for computation, the cost of data transmission becomes very high. At scales of 1 million or 10 million GPUs, most of the time will be spent on transmission. The more GPUs there are, the slower the data transmission, and computing power cannot increase linearly with the number of GPUs.
For example, if you expand from one GPU to two, you might get the performance of 1.95 GPUs. But if you expand from 100,000 to 200,000 GPUs, the performance might be closer to that of 100,000 GPUs rather than 200,000. A key reason for this is the added transmission steps and delays in the process.
Therefore, we believe that the implementation of scaling laws will become increasingly expensive, and the marginal benefits will decrease, but it is not ineffective. Continuing with pretraining models will still yield progress.
Zhao: If OpenAI is advancing inference models with the o1 series and might release the o2 series next year, why is OpenAI still heavily investing in developing GPT-5 and GPT-6? Why can't these two paths be merged?
Lee: These two paths are not mutually exclusive. I think pursuing both "fast thinking" and "slow thinking" is endless. For example, if a liberal arts student suddenly discovers new insights in calculus, it doesn’t mean they shouldn’t go back and study Plato again. There is no conflict between the two.
If we are to create a "super brain" in the future, we still hope it can excel in both humanities and sciences.
However, I believe there is another paradox in this process. Both scaling laws will make models increasingly slower. The first requires model creators to make models larger, and the larger the model, the slower the inference. The second involves adding "slow thinking" to large models, further reducing inference speed. Suppose that in the future, the inference time required for models extends from the current 5 seconds to 60 seconds, to 5 minutes or even 60 minutes, then the model would be unsuitable for most scenarios.
Therefore, I believe there is a non-mainstream but crucial insight that 01.AI emphasizes, especially after the emergence of "slow thinking" models like o1. This highlights the importance of this insight—we must achieve ultra-fast inference.
You can imagine, if 01.AI optimizes a "fast thinking" model to answer the same question in 0.3 seconds while other models take 3 seconds, adding "slow thinking" might slow the model down by 20 times. For others, 3 seconds slowing down 20 times becomes 1 minute, while for us, 0.3 seconds slowing down 20 times is only 6 seconds. In many scenarios, it would still be usable.
So, having an extremely fast inference engine means that even after adding "slow thinking," it won’t be too slow, offering greater value to users. Therefore, 01.AI will persist in developing ultra-fast inference models. Ultra-fast inference speed not only helps during "fast thinking" stages but also ensures the model remains usable with impressive performance when "slow thinking" is introduced.
Zhao: When we understand your logic for the basic "fast thinking" model, it’s clear there are limits in terms of data and computing power. But with new paradigms like o1 inference models, many aspects remain unclear. For instance, "slow thinking" in inference is a relative term. If we can reduce "slow thinking" from 5 seconds to 3 seconds, it becomes the critical factor for inference models. How do you think this competitiveness can be improved? Is it about the algorithms?
Lee: This is definitely a core competitive advantage and also 01.AI’s most distinctive feature. Our inference speed in the "fast thinking" phase is already extremely fast.
Zhao: If inference becomes faster, thinking will naturally be faster. How is this achieved? We are still unclear about how OpenAI’s o1 achieves this since it’s a black box. If Zero One can make inference two to three times faster, and o1 can also speed up inference by two to three times, how is 01.AI able to create such a fast inference model?
Lee: We’ve done the following:
1. Finding solutions to address speed reduction issues. Large models slow down because GPUs are constantly computing. Is it possible for GPUs to calculate less? Classic computer science suggests transferring computation and storage. In other words, what can be remembered doesn’t need to be recalculated. Techniques like hash tables in computer science reflect this principle: don’t calculate everything—store what has been calculated, and use it directly next time.
2. Memory caching. Anticipating data that might be needed later, we first bring it closer for convenient access. This is akin to watching videos online—sometimes buffering occurs due to data transmission over the network. A smart approach is to cache part of the video locally so even if the network lags, the video can play smoothly from local storage. This is the direction of caching.
Simply put, if the underlying inference model shifts from being a computation-heavy model to one that focuses more on storage, the inference speed can increase significantly—potentially tripling.
Moreover, 01.AI doesn’t research overly large, unshrinkable, or unscalable models. From the first day of research, we consider the machines to be used for inference, such as how much HBM, RAM, and SSD are available, and we plan model training accordingly.
Zhao: You explained this very well. Faster and cheaper are indeed critical factors. Does this represent the key to the widespread adoption of AI models in the application market? We understand "fast," but how do you achieve "cheap"? Computing power isn’t cheap, and data is becoming increasingly expensive. How do you speed up performance and inference while also keeping costs down?
Lee: We don’t achieve faster models by adding more machines to speed up large models. Instead, we use the same number and specifications of machines to make large models faster. Only by training and deploying models in this way can we achieve speed, cost-effectiveness, and competitiveness.
Our hardware is fixed. 01.AI ensures the fastest speeds within the same hardware constraints. Once the model speed improves, if the same cost generates three times the tokens, the company achieves three times the revenue. Alternatively, at the same cost, if we generate three times the tokens, the model’s external price can be reduced to a third or even lower.
Zhao: So what do you think is the most crucial factor here?
Lee: The key is to use less memory efficiently, rely less on GPUs, and avoid recalculating what can be stored. These principles guide our approach.
At 01.AI, researchers are required not to conduct overly ambitious AGI experiments but to pursue "practical and effective innovation."
A company’s strategic positioning must align with its circumstances. Back then, IBM’s computers were massive and expensive business machines, while Microsoft and Apple created PCs accessible to everyone. The strategic paths of these companies diverged significantly—some pursued the largest and strongest computers, while others developed the fastest and most user-friendly machines. 01.AI has currently chosen the latter path.
The primary goal for 01.AI in developing large models is speed and affordability. Within this framework, we aim to make the best possible models. Through a vertically integrated approach, we address issues of memory, cost reduction, and efficiency improvements, ultimately designing models that combine performance and cost-effectiveness.
On the Development of AI in China and the U.S.: Both To C and To B Will Explode Next Year, but Open-Source Models Still Face Challenges
Zhao: I can understand the mechanism and principles behind this. Let’s now look at the application side. We often say that China’s advantage is a huge application market; China is a large market, which is our advantage. Although our basic scientific research is not as strong as the U.S., we have the huge application market, and there are many relevant entrepreneurs. Just like in the Internet and mobile internet applications, we will lead globally. But now we see, especially in the general large models, except for ChatGPT, a consumer (C-end) product, there are very few particularly explosive consumer applications. More are likely to be in the B-end. In the U.S., B-end applications are flourishing, and some have even started making money. This area is a shortcoming for China. I would like to ask, will this increase the gap between us and the U.S.? And how do we view these opportunities in the current application market? Should To B (enterprise-level) lead, or To C (consumer-level) lead?
Lee: In the domestic market, we also see some of the concerns you mentioned, but I still believe that 2025 will be a turning point, with AI-first To C and To B applications both exploding.From the perspective of To C, the "To C apps that can monetize quickly but grow slowly" is not the strength of Chinese teams. The "To C apps that accumulate a lot of traffic first and then monetize" is the strength of Chinese teams. But in the past year, the methodology for building the latter kind of application has not found much use in China. Currently, Chatbot applications have not reached the stage where they can monetize, but without monetization, spending a lot of money on advertising to achieve millions of DAUs (daily active users) is not a long-term solution. However, I am optimistic about 2025 because the inference cost will be cheap enough, and the models will be good enough.
Yi-Lightning’s model from 01.AI, along with some other high-quality domestic models, are not only aligning with the top U.S. models in terms of performance, but they are also faster and cheaper. Next year, one big trend will be that increasingly cheaper inference costs will drive the emergence of high DAU applications, and the growth path of these AI-first To C applications, where the user base is accumulated first and monetization is explored later, will be clearer. A big opportunity for China’s large model field lies here. China can create more cost-effective models, making inference very cheap, and applying strategies accumulated from the mobile internet era to promote and grow AI applications, leading to more To C high DAU AI-first apps. With these factors combined, China’s To C will have great potential. This is the first point. Next, regarding To B, I also agree with your point.
The U.S. ecosystem is one where "I help you make money, and you help me make money." Corporate users have very mature payment habits, and this payment habit is something that China’s To B practitioners greatly envy. In the next year, I expect China’s To B ecosystem to make changes to payment habits, but this won’t be easy. But I believe that Chinese teams also have unique advantages. Chinese large model companies are more willing to go deep into enterprises and do customizations. We can first attempt breakthrough points and then iterate quickly. If our models can help enterprises "print money," we can also benefit from the growth of corporate clients. Currently, in fields like retail, gaming, and government affairs, we are already seeing some "dawn." 01.AI provides enough value to clients, so we are seeing good returns. Looking ahead to 2025, this is the hope I see for the To B field.
Zhao: Earlier, you mentioned the point about To B applications. For instance, OpenAI might not provide you with a model but rather an API interface. Does this mean that open-source models will have an advantage over closed-source models?
Lee: Open-source models are a very powerful force. 01.AI itself is doing open-source and recognizes the open-source approach. Under the open-source ecosystem, very good models can emerge, though they may not be the best.
But open-source also has some challenges. First, open-source is borderless. More and more countries are unwilling to share their data, and the gap will grow. At the same time, the legal data of one country may not be legal in another country, so there are risks with cross-border use. Second, open-source models also have a high debugging threshold. There is a major "misunderstanding" about open-source models: they only share the model and parameters, but the training process is a black box. On the other hand, even many large enterprises’ technical teams are not experts in model fine-tuning, so after introducing an open-source model, how enterprises continue to train the model based on their own needs will be a big challenge. Third, many open-source models do not consider inference speed and cost. Even if the performance is good, high inference costs and slow inference speeds are hard to meet enterprise demands. The characteristics of the models may not align with the enterprises’ needs. The advantage of closed-source models is that top closed-source models perform better than open-source models, and model vendors can send expert teams to serve enterprises. In terms of performance and To B professional services, purchasing closed-source models will be ahead.
Zhao: So, are open-source models more suitable for the Chinese market?
Lee: Actually, not necessarily. Generally speaking, Chinese large model companies are willing to provide services to enterprises. For enterprises, is it more cost-effective to introduce an open-source model and explore on their own, or to choose to collaborate with large model companies for co-building? I think the latter is better. Except for a few technically strong enterprises, choosing to co-build with large model companies is the better choice. Large model companies can help enterprises train differentiated models. Of course, the premise is that the enterprise is willing to pay.
The biggest advantage of open-source models is that they are free, but as the Americans say, "You get what you pay for." If you pay zero, what you get may require a bigger cost on other levels.
Zhao: So, even if an open-source model is tuned for enterprise use, it might become a closed-source model. Additionally, it may no longer be about open-source versus closed-source, but rather that enterprises need customized exclusive models, and this exclusive model may not be what we call a general large model but more likely an edge model. Can I understand it this way?
Lee: Customized models are often not edge models; they are large models deployed in a controlled environment within the enterprise.
I would bet that in almost 95% of cases, large model companies doing it for enterprises will be better than enterprises exploring open-source models on their own. Even if it’s 01.AI’ open-source model, I can 100% guarantee that the result will be worse if the enterprise does it themselves compared to paying a reasonable fee for us to help them with a closed-source model.
Future Challenges for Large Models: AI Capabilities Expected to Surpass Humans by 2030
Zhao: I understand. So, there's an interesting issue here. Looking at the entire U.S. To B cloud ecosystem, besides the big companies, the new large model unicorns are OpenAI and Anthropic. These two companies have deep partnerships with cloud service providers—OpenAI is closely integrated with Microsoft's cloud, and Anthropic works deeply with Amazon AWS. Now, the diversion is becoming more and more apparent, and they are even binding their own cloud services. So, from 01.AI's perspective, how do independent large model companies in China solve the ecological problem without binding with cloud providers?
Lee: Cloud services in China are not as widespread as they are abroad. Most Chinese companies still prefer to deploy models locally within their enterprises, rather than using cloud deployment. At the same time, many use cases of large models involve internal business data, financial data, documents, emails, etc., which have high confidentiality requirements. In these cases, enterprises are more likely to prefer private deployments. Over the next two years, the challenge of how large models and cloud services will integrate strongly may not be a challenge that an independent large model company will face.
Zhao: Over the next two years, what will be the biggest challenge for an independent large model company competing with large ecosystem model companies?
Lee: I think our biggest challenge is that large model companies have entered a new phase. We need to prove that we can have continuous revenue growth and that we can see the day when we break even. Looking at the development process from AI 1.0, the industry's focus has shifted from "who has the strongest team," "who has written the most papers," and "who has achieved the highest scores" to "who has implemented the first application" and "who made the first pot of gold." Just like the "Four Little Dragons of AI" from back then, these companies have now achieved that.
The next phase will involve the soul-searching question: Can you take on more orders? Can you make a profit in some areas and prove that the business has reached a scalable stage, after which you can consider the issue of listing?
We saw in the AI 1.0 era that some companies, which didn't undergo this soul-searching, were lucky enough to go public early but faced difficulties like stock price declines. Some unlucky ones never managed to list.
So, this is a huge challenge for the large model field: as the competition shifts from technological to commercial, can we transition from being scholar-type entrepreneurs to entrepreneur-type entrepreneurs? If this hurdle can't be overcome, the road will only get narrower.
Now, the few leading large model companies, including the "Six Little Tigers" of large models, have almost stopped competing with each other, and each company is taking its own path. The large model track is much bigger than the computer vision field in the AI 1.0 era. Perhaps each company will become great in different areas. Five years from now, these companies may not even be called "large model companies" anymore because they will have found new paths.
Zhao: You mentioned earlier that there are five or six model companies in China, each with its own positioning. Which positioning do you think 01.AI belongs to?
Lee: Two positions. First, we are determined to make the fastest and cheapest models to ignite the innovative ecosystem for both To C and To B. Second, we are determined not to blindly burn money buying unprofitable traffic, and not to engage in "loss-making publicity" businesses.
Zhao: Currently, we can see that Li Feifei is working on spatial intelligence, Yang Likun is working on world models, and Marc Raibert, the founder of Boston Dynamics, is researching new algorithms for robots. They are all solving the same problem—using embodied intelligence in robots to address the limitations of large language models. So, have you considered combining models with robots to break through model limitations?
Lee: "Embodied intelligence" is definitely the next very important direction and milestone for AGI. It will be a key application scenario for generative AI. Currently, embodied intelligence can only achieve a rough understanding of real-world objects and environments, performing basic operations that don't require high accuracy, but many technical issues remain to be solved.
From a macro perspective on the development of large models, text is just the first step, multimodal is the second, and then it should be action—intelligent agents that directly help you do things, not just give you answers. It must have the ability to act; only then is it a complete form of intelligence.
Of course, we've seen many cool demonstrations, but these are low-hanging fruits. Embodied intelligence needs more time to create real commercial value. Currently, 01.AI still needs to focus on large model innovation and can't afford to be distracted by these tasks. However, we are very open to exploring collaborations with embodied intelligence companies. As a "brain," large models can have many overlapping directions with embodied intelligence.
Zhao: Finally, some say that after the release of the O1 inference model, it means AGI has already been achieved. In your opinion, how should AGI continue to develop, and what conditions are needed for its realization?
Lee: Today, both humans and AI can do many things. Some things humans do better, and some AI does better. AI will develop faster than humans, and there will come a time when AI can do more than humans. But we believe that it may not be able to do everything that humans can do.
The EPOCH AI think tank has conducted a quantitative analysis of AGI. They analyzed the improvements from GPT-2 to GPT-4 and concluded that GPT-6 or GPT-7 would need to make similar improvements from GPT-4 to reach AGI. In other words, the progress from GPT-4 to GPT-7 needs to be as significant as that from GPT-2 to GPT-4. Based on a scientifically cautious approach, they estimate that AGI may be achieved around 2030. I believe this prediction is quite reliable.
Zhao: Okay, thank you, Professor Lee. The conversation just now was very exciting. Professor Lee has shared many insightful and candid thoughts with us. We also believe that the AI industry will undergo significant changes in the coming year. We hope to continue being observers and recorders of these changes and maintain such dialogues and communications with Professor Lee. I also sincerely thank everyone for participating in today's conversation. We believe that the answers provided by Professor Lee are not only insightful but also a very real reflection of the status quo.