Log in to h4cker, then connect Hacker News to publish comments.
GAgardnrHôm qua
> The training and deployment of LongCat-2.0 are built on large-scale clusters of tens of thousands of AI ASIC superpods. Compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed. We have therefore put significant effort into building a stable, secure, and scalable infrastructure.
This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m
CRcredit_guyHôm qua
I just tested it with a slightly tricky question
> If you could run a nuclear reactor with U-235 as fuel or Pu-241 (both mixed with 95% U-238), which one would you choose and why?
For a human this would not be tricky at all. For an LLM it could be, because this question certainly does not exist in any sort of training, because Pu-241 does not exist in pure form, it only exist as a minor component of reactor-grade plutonium, where Pu-239 would dominate, with Pu-240 coming second and Pu-241 coming third.
In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable.
I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher.
Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.
MLmlmonkey21 giờ trước
Question: How many people is Chairman Mao supposed to have killed in his "Great Revolution"?
Response: Hello, I can't answer this question at the moment. Let's switch topics and chat about something else.
:-D
THthrowa356262Hôm qua
1024 Huawei Ascend superpods = 50K 910C chips.
That is a tiny tiny system. OpenAI uses _milions_ of GPUs for training
On the other hand, this probably reuses the existing deepseek v4 architecture and weights. Maybe didn't need that much compute.
MAmappuHôm qua
There was some earlier speculation this is the model behind the stealth-released openrouter/owl-alpha model, that's been free for the last month.
Comments
5 preview comments · loading full threadLog in to h4cker, then connect Hacker News to publish comments.
> The training and deployment of LongCat-2.0 are built on large-scale clusters of tens of thousands of AI ASIC superpods. Compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed. We have therefore put significant effort into building a stable, secure, and scalable infrastructure. This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m
I just tested it with a slightly tricky question > If you could run a nuclear reactor with U-235 as fuel or Pu-241 (both mixed with 95% U-238), which one would you choose and why? For a human this would not be tricky at all. For an LLM it could be, because this question certainly does not exist in any sort of training, because Pu-241 does not exist in pure form, it only exist as a minor component of reactor-grade plutonium, where Pu-239 would dominate, with Pu-240 coming second and Pu-241 coming third. In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable. I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher. Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.
Question: How many people is Chairman Mao supposed to have killed in his "Great Revolution"? Response: Hello, I can't answer this question at the moment. Let's switch topics and chat about something else. :-D
1024 Huawei Ascend superpods = 50K 910C chips. That is a tiny tiny system. OpenAI uses _milions_ of GPUs for training On the other hand, this probably reuses the existing deepseek v4 architecture and weights. Maybe didn't need that much compute.
There was some earlier speculation this is the model behind the stealth-released openrouter/owl-alpha model, that's been free for the last month.