GPT 4.5 level for 1% of the price

296 pointsby decide10002 months ago

24 comments

GavCo2 months ago

Surprised nobody has pointed this out yet — this is not a GPT 4.5 level model.The source for this claim is apparently a chart in the second tweet in the thread, which compares ERNIE-4.5 to GPT-4.5 across 15 benchmarks and shows that ERNIE-4.5 scores an average of 79.6 vs 79.14 for GPT-4.5.The problem is that the benchmarks they included in the average are cherry-picked.They included benchmarks on 6 Chinese language datasets (C-Eval, CMMLU, Chinese SimpleQA, CNMO2024, CMath, and CLUEWSC) along with many of the standard datasets that all of the labs report results for. On 4 of these Chinese benchmarks, ERNIE-4.5 outperforms GPT-4.5 by a big margin, which skews the whole average.This is not how results are normally reported and (together with the name) seems like a deliberate attempt to misrepresent how strong the model is.Bottom line, ERNIE-4.5 is substantially worse than GPT-4.5 on most of the difficult benchmarks, matches GPT-4.5 and other top models on saturated benchmarks, and is better only on (some) Chinese datasets.

评论 #43379064 未加载

评论 #43379799 未加载

评论 #43378946 未加载

评论 #43378821 未加载

ksec2 months ago

I guess this is the end of OpenAI? No more dreaming of Universal Basic Compute for AI, Multi Trillion for Fabs and Semi?This is just like everything in China. They will find ways to drive down cost to below anyone previously imagined, subsidised or not. And even just competing among themselves with DeepSeek vs ERNIE and Open sourcing them meant there is very little to no space for most.Both DRAM and NAND industry for Samsung / Micron may soon be gone, I thought this was going to happen sooner but it seems finally happening. GPU and CPU Designs are already in the pipelines with RISC-V, IMG and ARM-China. OLED is catching up, LCD is already taken over. Batteries we know. The only thing left is foundries.Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.

评论 #43379086 未加载

评论 #43379581 未加载

评论 #43378408 未加载

评论 #43378620 未加载

评论 #43378936 未加载

评论 #43379131 未加载

评论 #43378318 未加载

评论 #43378960 未加载

评论 #43378289 未加载

评论 #43378445 未加载

评论 #43378663 未加载

评论 #43379069 未加载

评论 #43378727 未加载

评论 #43378868 未加载

评论 #43378439 未加载

评论 #43378511 未加载

评论 #43378279 未加载

评论 #43378624 未加载

评论 #43378457 未加载

patrickhogan12 months ago

What's interesting about Baidu's AI model Ernie is that Baidu and its founder, Robin Li, have been working on AI for a long time. Robin Li has a strong background in AI research going back many years. Also notable is that some of the key early research on scaling laws—important for understanding how AI models improve as they get bigger—was done by Baidu's AI lab. This shows Baidu's significant role in the ongoing development of AI.<a href="https://research.baidu.com/Blog/index-view?id=89" rel="nofollow">https://research.baidu.com/Blog/index-view?id=89</a>I am excited to see Baidu catchup. It feels like they have earned it. Being very early.

评论 #43378737 未加载

评论 #43378535 未加载

评论 #43378390 未加载

jampekka2 months ago

And open weights promised for June. China is really taking over in the ML game.<a href="https://x.com/Baidu_Inc/status/1890292032318652719" rel="nofollow">https://x.com/Baidu_Inc/status/1890292032318652719</a>

评论 #43383741 未加载

pacifika2 months ago

Is the title claim correct? It is not mentioned as such in the tweet.

评论 #43378404 未加载

评论 #43378343 未加载

评论 #43385671 未加载

decide10002 months ago

ERNIE 4.5: Input and output prices start as low as $0.55 per 1M tokens and $2.2 per 1M tokens, respectively.Comparison models: <a href="https://x.com/Baidu_Inc/status/1901094083508220035/photo/1" rel="nofollow">https://x.com/Baidu_Inc/status/1901094083508220035/photo/1</a>

simonw2 months ago

Anyone managed to try this yet? <a href="https://yiyan.baidu.com/" rel="nofollow">https://yiyan.baidu.com/</a> appears to require a Chinese phone number.

评论 #43378920 未加载

评论 #43378430 未加载

评论 #43378206 未加载

评论 #43378248 未加载

Logge2 months ago

GTP 4.5 is not a reasoning model. Reasoning models outperform it clearly. Even OpenAIs o3-mini is smarter while being magnitudes cheaper. Those 2 should be compared in my opinion. GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models.

评论 #43378450 未加载

评论 #43379608 未加载

colesantiago2 months ago

Good.OpenAI, Anthropic, et al, are getting sucked into a vortex of competition with China that is ultimately going to zero.AI is the ultimate race to zero.There is no moat. AI and intelligence is becoming a commodity with nobody (except Nvidia) is making money. This is known for a while now.The acceleration and adoption would only make those in the middle who aren't aware of the change happening without a job and unable to get a job.The US-China competition in addition to Jevons Paradox will be so viciously fierce that jobs will be removed as soon as they are created.

评论 #43380397 未加载

jamesblonde2 months ago

Baidu have a long history in the scalable distributed deep learning space. PaddlePaddle (so good they named it twice) predates Ray and supports both data parallel and model-parallel training. It is still being developed.<a href="https://github.com/PaddlePaddle/Paddle" rel="nofollow">https://github.com/PaddlePaddle/Paddle</a>They have pedigry.

kleiba2 months ago

US: Could I interest you in my lunch?China: Thanks, already on it.

curl-up2 months ago

Cheap means small, small means low Q&A scores. I know that this isn't that important for the majority of applications, but I feel that over-reliance on RAG whenever Q&A performance is discussed is quite misleading.Being able to clearly and correctly discuss science topics, to write about art, to understand nuances in (previously unseen) literature, etc. is impossible simply through powerful-reasoning + RAG, and so many advanced use cases would be enabled by this. Sonnet 3.5+ and GPT 4.5 are still unparalleled here, and it's not even close.

pera2 months ago

<a href="https://nitter.space/Baidu_Inc/status/1901089355890036897" rel="nofollow">https://nitter.space/Baidu_Inc/status/1901089355890036897</a>

cubefox2 months ago

The title is editorialized in a misleading manner.

ohso42 months ago

Lmarena.ai is a very accurate eval (with stylecontrol). Other benchmarks like AIME and whatever can be trained on/optimized for and therefore should not be trusted. Most ai companies do something fishy to boost their benchmark scores.

gitfan862 months ago

There is a interesting dynamic of supply and demand here. 1% is basically free for all existing use cases today.BUT new use cases are now realistic. The question is how long until demand for the new use cases shows up

评论 #43378841 未加载

logicchains2 months ago

Quite impressive if true because historically Baidu's models have tended to under-perform.

unhappy_meaning2 months ago

Man the AI race is just launching at all fronts.

infrawhispers2 months ago

NICE. This is the capitalism I signed up for…not OpenAI and Anthropic charging $200/mo for an LLM while trying to do regulatory capture.

评论 #43378335 未加载

评论 #43378215 未加载

itsTyrionabout 2 months ago

Wake up honey, another company burned a few dozen gigawatthours on a shitty LLM

hjgjhyuhy2 months ago

[flagged]

评论 #43378687 未加载

评论 #43378742 未加载

评论 #43378350 未加载

评论 #43378362 未加载

评论 #43378301 未加载

评论 #43378270 未加载

评论 #43378327 未加载

camillomiller2 months ago

I hear the rumbling coming in Altmanland

评论 #43378177 未加载

评论 #43378265 未加载

评论 #43378269 未加载

buyucu2 months ago

I got flagged the last time I said this, but lets try again:OpenAI is increasingly irrelevant. They no longer push the boundaries of technology.

评论 #43380122 未加载

folli2 months ago

Hijacking this thread: what's currently the cheapest way to get structured data out of a PDF?I assume there's some reasonable tool out there to convert PDFs to Markup and than feed it to some LLM API with okay costs (Gemini? DeepSeek?). Any suggestions?

评论 #43384938 未加载

评论 #43379607 未加载