TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GPT 4.5 level for 1% of the price

296 点作者 decide1000大约 2 个月前

24 条评论

GavCo大约 2 个月前
Surprised nobody has pointed this out yet — this is not a GPT 4.5 level model.<p>The source for this claim is apparently a chart in the second tweet in the thread, which compares ERNIE-4.5 to GPT-4.5 across 15 benchmarks and shows that ERNIE-4.5 scores an average of 79.6 vs 79.14 for GPT-4.5.<p>The problem is that the benchmarks they included in the average are cherry-picked.<p>They included benchmarks on 6 Chinese language datasets (C-Eval, CMMLU, Chinese SimpleQA, CNMO2024, CMath, and CLUEWSC) along with many of the standard datasets that all of the labs report results for. On 4 of these Chinese benchmarks, ERNIE-4.5 outperforms GPT-4.5 by a big margin, which skews the whole average.<p>This is not how results are normally reported and (together with the name) seems like a deliberate attempt to misrepresent how strong the model is.<p>Bottom line, ERNIE-4.5 is substantially worse than GPT-4.5 on most of the difficult benchmarks, matches GPT-4.5 and other top models on saturated benchmarks, and is better only on (some) Chinese datasets.
评论 #43379064 未加载
评论 #43379799 未加载
评论 #43378946 未加载
评论 #43378821 未加载
ksec大约 2 个月前
I guess this is the end of OpenAI? No more dreaming of Universal Basic Compute for AI, Multi <i>Trillion</i> for Fabs and Semi?<p>This is just like everything in China. They will find ways to drive down cost to below anyone previously imagined, subsidised or not. And even just competing among themselves with DeepSeek vs ERNIE <i>and</i> Open sourcing them meant there is very little to no space for most.<p>Both DRAM and NAND industry for Samsung &#x2F; Micron may soon be gone, I thought this was going to happen sooner but it seems finally happening. GPU and CPU Designs are already in the pipelines with RISC-V, IMG and ARM-China. OLED is catching up, LCD is already taken over. Batteries we know. The only thing left is foundries.<p>Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.
评论 #43379086 未加载
评论 #43379581 未加载
评论 #43378408 未加载
评论 #43378620 未加载
评论 #43378936 未加载
评论 #43379131 未加载
评论 #43378318 未加载
评论 #43378960 未加载
评论 #43378289 未加载
评论 #43378445 未加载
评论 #43378663 未加载
评论 #43379069 未加载
评论 #43378727 未加载
评论 #43378868 未加载
评论 #43378439 未加载
评论 #43378511 未加载
评论 #43378279 未加载
评论 #43378624 未加载
评论 #43378457 未加载
patrickhogan1大约 2 个月前
What&#x27;s interesting about Baidu&#x27;s AI model Ernie is that Baidu and its founder, Robin Li, have been working on AI for a long time. Robin Li has a strong background in AI research going back many years. Also notable is that some of the key early research on scaling laws—important for understanding how AI models improve as they get bigger—was done by Baidu&#x27;s AI lab. This shows Baidu&#x27;s significant role in the ongoing development of AI.<p><a href="https:&#x2F;&#x2F;research.baidu.com&#x2F;Blog&#x2F;index-view?id=89" rel="nofollow">https:&#x2F;&#x2F;research.baidu.com&#x2F;Blog&#x2F;index-view?id=89</a><p>I am excited to see Baidu catchup. It feels like they have earned it. Being very early.
评论 #43378737 未加载
评论 #43378535 未加载
评论 #43378390 未加载
jampekka大约 2 个月前
And open weights promised for June. China is really taking over in the ML game.<p><a href="https:&#x2F;&#x2F;x.com&#x2F;Baidu_Inc&#x2F;status&#x2F;1890292032318652719" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;Baidu_Inc&#x2F;status&#x2F;1890292032318652719</a>
评论 #43383741 未加载
pacifika大约 2 个月前
Is the title claim correct? It is not mentioned as such in the tweet.
评论 #43378404 未加载
评论 #43378343 未加载
评论 #43385671 未加载
decide1000大约 2 个月前
ERNIE 4.5: Input and output prices start as low as $0.55 per 1M tokens and $2.2 per 1M tokens, respectively.<p>Comparison models: <a href="https:&#x2F;&#x2F;x.com&#x2F;Baidu_Inc&#x2F;status&#x2F;1901094083508220035&#x2F;photo&#x2F;1" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;Baidu_Inc&#x2F;status&#x2F;1901094083508220035&#x2F;photo&#x2F;1</a>
simonw大约 2 个月前
Anyone managed to try this yet? <a href="https:&#x2F;&#x2F;yiyan.baidu.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;yiyan.baidu.com&#x2F;</a> appears to require a Chinese phone number.
评论 #43378920 未加载
评论 #43378430 未加载
评论 #43378206 未加载
评论 #43378248 未加载
Logge大约 2 个月前
GTP 4.5 is not a reasoning model. Reasoning models outperform it clearly. Even OpenAIs o3-mini is smarter while being magnitudes cheaper. Those 2 should be compared in my opinion. GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models.
评论 #43378450 未加载
评论 #43379608 未加载
colesantiago大约 2 个月前
Good.<p>OpenAI, Anthropic, et al, are getting sucked into a vortex of competition with China that is ultimately going to zero.<p>AI is the ultimate race to zero.<p>There is no moat. AI and intelligence is becoming a commodity with nobody (except Nvidia) is making money. This is known for a while now.<p>The acceleration and adoption would only make those in the middle who aren&#x27;t aware of the change happening without a job and unable to get a job.<p>The US-China competition in addition to Jevons Paradox will be so viciously fierce that jobs will be removed as soon as they are created.
评论 #43380397 未加载
jamesblonde大约 2 个月前
Baidu have a long history in the scalable distributed deep learning space. PaddlePaddle (so good they named it twice) predates Ray and supports both data parallel and model-parallel training. It is still being developed.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;PaddlePaddle&#x2F;Paddle" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;PaddlePaddle&#x2F;Paddle</a><p>They have pedigry.
kleiba大约 2 个月前
US: Could I interest you in my lunch?<p>China: Thanks, already on it.
curl-up大约 2 个月前
Cheap means small, small means low Q&amp;A scores. I know that this isn&#x27;t that important for the majority of applications, but I feel that over-reliance on RAG whenever Q&amp;A performance is discussed is quite misleading.<p>Being able to clearly and correctly discuss science topics, to write about art, to understand nuances in (previously unseen) literature, etc. is impossible simply through powerful-reasoning + RAG, and so many advanced use cases would be enabled by this. Sonnet 3.5+ and GPT 4.5 are still unparalleled here, and it&#x27;s not even close.
pera大约 2 个月前
<a href="https:&#x2F;&#x2F;nitter.space&#x2F;Baidu_Inc&#x2F;status&#x2F;1901089355890036897" rel="nofollow">https:&#x2F;&#x2F;nitter.space&#x2F;Baidu_Inc&#x2F;status&#x2F;1901089355890036897</a>
cubefox大约 2 个月前
The title is editorialized in a misleading manner.
ohso4大约 2 个月前
Lmarena.ai is a very accurate eval (with stylecontrol). Other benchmarks like AIME and whatever can be trained on&#x2F;optimized for and therefore should not be trusted. Most ai companies do something fishy to boost their benchmark scores.
gitfan86大约 2 个月前
There is a interesting dynamic of supply and demand here. 1% is basically free for all existing use cases today.<p>BUT new use cases are now realistic. The question is how long until demand for the new use cases shows up
评论 #43378841 未加载
logicchains大约 2 个月前
Quite impressive if true because historically Baidu&#x27;s models have tended to under-perform.
unhappy_meaning大约 2 个月前
Man the AI race is just launching at all fronts.
infrawhispers大约 2 个月前
NICE. This is the capitalism I signed up for…not OpenAI and Anthropic charging $200&#x2F;mo for an LLM while trying to do regulatory capture.
评论 #43378335 未加载
评论 #43378215 未加载
itsTyrion大约 2 个月前
Wake up honey, another company burned a few dozen gigawatthours on a shitty LLM
hjgjhyuhy大约 2 个月前
[flagged]
评论 #43378687 未加载
评论 #43378742 未加载
评论 #43378350 未加载
评论 #43378362 未加载
评论 #43378301 未加载
评论 #43378270 未加载
评论 #43378327 未加载
camillomiller大约 2 个月前
I hear the rumbling coming in Altmanland
评论 #43378177 未加载
评论 #43378265 未加载
评论 #43378269 未加载
buyucu大约 2 个月前
I got flagged the last time I said this, but lets try again:<p>OpenAI is increasingly irrelevant. They no longer push the boundaries of technology.
评论 #43380122 未加载
folli大约 2 个月前
Hijacking this thread: what&#x27;s currently the cheapest way to get structured data out of a PDF?<p>I assume there&#x27;s some reasonable tool out there to convert PDFs to Markup and than feed it to some LLM API with okay costs (Gemini? DeepSeek?). Any suggestions?
评论 #43384938 未加载
评论 #43379607 未加载