(Image source: Photo by Lin Zhijia, TMTPost AGI Editor)
AsianFin -- In a surprising revelation, DeepSeek researcher Daya Guo shared on X platform that DeepSeek R1 training took only two to three weeks, saying “The happiest moment during the Spring Festival was witnessing R1Zero's curves continuously increase and truly feeling the power of reinforcement learning.”
Additionally, during the Chinese New Year, the research team observed significant improvements in R1-Zero, demonstrating the immense potential of reinforcement learning (RL).
On February 1, the fourth day of the Chinese New Year, Guo took to X to express his excitement over the performance of R1-Zero.
In replies to netizens, Guo said “We use benchmarks from domains not covered by the RL prompt to evaluate generalization. So far, it appears to have generalization capability.”
“I think we are still in an early stage, and there is still a long way to explore in RL. I believe there will be significant progress this year,” Guo added.
Last Friday evening, there were reports that Alibaba plans to invest US$1 billion to acquire a 10% stake in DeepSeek based on a valuation of $10 billion, and that both parties are currently discussing transaction details.
Yan Qiao, a vice president at Alibaba, responded on her WeChat Moments saying that Alibaba, as a fellow Hangzhou-based company, applauds DeepSeek, but the circulating rumors about Alibaba investing in DeepSeek are false.
DeepSeek's current valuation is about $8 billion, according to industry insiders. The rumors initially spread within investment circles and quantitative groups, attracting significant interest from several investment institutions.
Zhu Xiaohu, a managing partner at GSR Ventures, previously said that he would definitely invest if DeepSeek opens for financing. Zhu believes DeepSeek should remain open to financing because moving forward will require significant investment, particularly in computational resources like GPU cards.