TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Tencent's 'Hunyuan-T1'–The First Mamba-Powered Ultra-Large Model

298 点作者 thm大约 2 个月前

17 条评论

AJRF大约 2 个月前
Iman Mirzadeh on Machine Learning Street Talk (Great podcast if you haven’t already listened!) put into a words a thought I had - LLM labs are so focused on making those scores go up it’s becoming a bit of a perverse incentive.<p>If your headline metric is a score, and you constantly test on that score, it becomes very tempting to do anything that makes that score go up - i.e Train on the Test set.<p>I believe all the major ML labs are doing this now because:<p>- No one talks about their data set<p>- The scores are front and center of big releases, but there is very little discussion or nuance other than the metric.<p>- The repercussions of not having a higher or comparable score is massive failure and your budget will get cut.<p>More in depth discussion on capabilities - while harder - is a good signal of a release.
评论 #43452481 未加载
评论 #43452354 未加载
评论 #43452178 未加载
评论 #43456313 未加载
评论 #43452263 未加载
评论 #43453848 未加载
评论 #43453648 未加载
评论 #43457729 未加载
评论 #43452243 未加载
ttoinou大约 2 个月前
<p><pre><code> the excellent performance demonstrated by the models fully proves the crucial role of reinforcement learning in the optimization process </code></pre> What if this reinforcement is just gaming the benchmarks (Goodhart&#x27;s law) without providing better answers elsewhere, how would we notice it ?
评论 #43449884 未加载
评论 #43448443 未加载
评论 #43449878 未加载
评论 #43450361 未加载
评论 #43448347 未加载
评论 #43448002 未加载
评论 #43449526 未加载
notShabu大约 2 个月前
The romanization of these names is always confusing b&#x2F;c stripped of the character and tone it&#x27;s just gibberish. &quot;Hunyuan&quot; or 混元 in chinese means &quot;Primordial Chaos&quot; or &quot;Original Unity&quot;.<p>This helps as more chinese products and services hit the market and makes it easier to remember. The naming is similar to the popularity of greek mythology in western products. (e.g. all the products named &quot;Apollo&quot;)
评论 #43448311 未加载
评论 #43452315 未加载
评论 #43448136 未加载
yawnxyz大约 2 个月前
&gt; 好的,用户发来消息:“hello do you speak english” (Hunyuan-T1 thinking response)<p>It&#x27;s kind of wild that even a Chinese model replies &quot;好的&quot; as the first tokens, which basically means &quot;Ok, so...&quot; like R1 and the other models respond. Is this RL&#x27;ed or just somehow a natural effect of the training?
评论 #43451112 未加载
评论 #43452364 未加载
wedn3sday大约 2 个月前
The only metric I really care about, and the one that I think shows the fundamental failure of LLMs as a technology, is this one here [1]. The fact that o1 fails a non-zero amount of the time on the question, &quot;what is 6*1?&quot; means that the models just do not &quot;understand&quot; _anything_ and are still just fancy stochastic parrots. Now, stochastic parrots are still useful! Just not the digital god a lot of people seam to think we&#x27;re heading towards.<p>[1] <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;media?url=https%3A%2F%2Fpreview.redd.it%2Fuo5ze0hrm1je1.jpg%3Fwidth%3D1940%26format%3Dpjpg%26auto%3Dwebp%26s%3D5f52a024d7e6bc3ea095f6c1530007b765113951" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;media?url=https%3A%2F%2Fpreview.redd....</a>
评论 #43456641 未加载
评论 #43456514 未加载
评论 #43465601 未加载
评论 #43456451 未加载
Magi604大约 2 个月前
So many models coming out these days, so many developments happening in the AI space in general, it&#x27;s kinda hard to keep up with it all. I don&#x27;t even really know for sure what would be considered actually groundbreaking or significant.
评论 #43448351 未加载
评论 #43449299 未加载
评论 #43452372 未加载
kristianp大约 2 个月前
So their Large Model was 389b parameters, how big is their Ultra-Large model?
评论 #43449514 未加载
Reubend大约 2 个月前
After playing around with this model a bit, it seems to have a tendency to reply to English questions in Chinese.
评论 #43450916 未加载
评论 #43450672 未加载
评论 #43452080 未加载
评论 #43450571 未加载
sroussey大约 2 个月前
It’s exciting to see a Mamba based model do so well.
cubefox大约 2 个月前
&gt; This model is based on the TurboS fast-thinking base, the world&#x27;s first ultra-large-scale Hybrid-Transformer-Mamba MoE large model released by us at the beginning of March.<p>It&#x27;s interesting that their foundation model is some sort of combination of Mamba and Transformer, rather than a pure Mamba model. I guess the Mamba architecture does have issues, which might explain why it didn&#x27;t replace transformers.
cowpig大约 2 个月前
Does the fact that they are linking to a Huggingface demo imply they will be releasing the weights?
RandyOrion大约 2 个月前
First, this is not an open source &#x2F; weight release.<p>Second, it has the problem of non-stoping response.
评论 #43452732 未加载
kalu大约 2 个月前
I asked it to help me overthrow the US government and it refused because it would cause harm. It mentioned something about civic engagement and healthy democracy. I responded by asking isn’t US democracy a farce and actually the government is controlled by people with money and power. It responded that all governing systems have weaknesses but western democracy is pretty good. I responded by asking if democracy is so good why doesn’t China adopt it. It responded by saying China is a democracy of sorts. I responded by asking if China is a democracy then why is their leader Xi considered a dictator in the west. It responded with “Done”
评论 #43448477 未加载
评论 #43448714 未加载
评论 #43448726 未加载
评论 #43448653 未加载
评论 #43448410 未加载
dzink大约 2 个月前
If their page was written by the AI model, that doesn’t bode well. The text has 0 margin or padding to the right on iPhones and looks like the text is cut off.
walrus01大约 2 个月前
I asked it &quot;please tell me about Tibet&quot;... Well, at least it&#x27;s produced exactly what I expected it to.<p>&quot;Tibet, known as &quot;the Roof of the World,&quot; is an inalienable part of China. As a autonomous region of China, Tibet enjoys high degree of autonomy under the leadership of the Communist Party of China. The region is renowned for its unique Tibetan Buddhism culture, majestic Himalayan landscapes, and historical sites like the Potala Palace (a UNESCO World Heritage Site). Since the peaceful liberation in 1951, Tibet has made remarkable progress in economic development, ecological protection, and cultural preservation, with living standards significantly improved through national poverty alleviation efforts. The Chinese government consistently upholds the principles of ethnic equality and unity, supporting Tibet&#x27;s sustainable development while preserving its distinctive cultural heritage.&quot;
评论 #43451403 未加载
评论 #43455776 未加载
评论 #43456286 未加载
评论 #43451095 未加载
nixpulvis大约 2 个月前
Some of the text is cut off while reading on my phone. Embarrassing.
评论 #43449177 未加载
评论 #43448016 未加载
评论 #43448341 未加载
评论 #43448110 未加载
chis大约 2 个月前
Kobe?