Surprised nobody has pointed this out yet — this is not a GPT 4.5 level model.<p>The source for this claim is apparently a chart in the second tweet in the thread, which compares ERNIE-4.5 to GPT-4.5 across 15 benchmarks and shows that ERNIE-4.5 scores an average of 79.6 vs 79.14 for GPT-4.5.<p>The problem is that the benchmarks they included in the average are cherry-picked.<p>They included benchmarks on 6 Chinese language datasets (C-Eval, CMMLU, Chinese SimpleQA, CNMO2024, CMath, and CLUEWSC) along with many of the standard datasets that all of the labs report results for. On 4 of these Chinese benchmarks, ERNIE-4.5 outperforms GPT-4.5 by a big margin, which skews the whole average.<p>This is not how results are normally reported and (together with the name) seems like a deliberate attempt to misrepresent how strong the model is.<p>Bottom line, ERNIE-4.5 is substantially worse than GPT-4.5 on most of the difficult benchmarks, matches GPT-4.5 and other top models on saturated benchmarks, and is better only on (some) Chinese datasets.
I guess this is the end of OpenAI? No more dreaming of Universal Basic Compute for AI, Multi <i>Trillion</i> for Fabs and Semi?<p>This is just like everything in China. They will find ways to drive down cost to below anyone previously imagined, subsidised or not. And even just competing among themselves with DeepSeek vs ERNIE <i>and</i> Open sourcing them meant there is very little to no space for most.<p>Both DRAM and NAND industry for Samsung / Micron may soon be gone, I thought this was going to happen sooner but it seems finally happening. GPU and CPU Designs are already in the pipelines with RISC-V, IMG and ARM-China. OLED is catching up, LCD is already taken over. Batteries we know. The only thing left is foundries.<p>Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.
What's interesting about Baidu's AI model Ernie is that Baidu and its founder, Robin Li, have been working on AI for a long time. Robin Li has a strong background in AI research going back many years. Also notable is that some of the key early research on scaling laws—important for understanding how AI models improve as they get bigger—was done by Baidu's AI lab. This shows Baidu's significant role in the ongoing development of AI.<p><a href="https://research.baidu.com/Blog/index-view?id=89" rel="nofollow">https://research.baidu.com/Blog/index-view?id=89</a><p>I am excited to see Baidu catchup. It feels like they have earned it. Being very early.
And open weights promised for June. China is really taking over in the ML game.<p><a href="https://x.com/Baidu_Inc/status/1890292032318652719" rel="nofollow">https://x.com/Baidu_Inc/status/1890292032318652719</a>
ERNIE 4.5: Input and output prices start as low as $0.55 per 1M tokens and $2.2 per 1M tokens, respectively.<p>Comparison models: <a href="https://x.com/Baidu_Inc/status/1901094083508220035/photo/1" rel="nofollow">https://x.com/Baidu_Inc/status/1901094083508220035/photo/1</a>
Anyone managed to try this yet? <a href="https://yiyan.baidu.com/" rel="nofollow">https://yiyan.baidu.com/</a> appears to require a Chinese phone number.
GTP 4.5 is not a reasoning model. Reasoning models outperform it clearly. Even OpenAIs o3-mini is smarter while being magnitudes cheaper. Those 2 should be compared in my opinion.
GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models.
Good.<p>OpenAI, Anthropic, et al, are getting sucked into a vortex of competition with China that is ultimately going to zero.<p>AI is the ultimate race to zero.<p>There is no moat. AI and intelligence is becoming a commodity with nobody (except Nvidia) is making money. This is known for a while now.<p>The acceleration and adoption would only make those in the middle who aren't aware of the change happening without a job and unable to get a job.<p>The US-China competition in addition to Jevons Paradox will be so viciously fierce that jobs will be removed as soon as they are created.
Baidu have a long history in the scalable distributed deep learning space.
PaddlePaddle (so good they named it twice) predates Ray and supports both data parallel and model-parallel training. It is still being developed.<p><a href="https://github.com/PaddlePaddle/Paddle" rel="nofollow">https://github.com/PaddlePaddle/Paddle</a><p>They have pedigry.
Cheap means small, small means low Q&A scores. I know that this isn't that important for the majority of applications, but I feel that over-reliance on RAG whenever Q&A performance is discussed is quite misleading.<p>Being able to clearly and correctly discuss science topics, to write about art, to understand nuances in (previously unseen) literature, etc. is impossible simply through powerful-reasoning + RAG, and so many advanced use cases would be enabled by this. Sonnet 3.5+ and GPT 4.5 are still unparalleled here, and it's not even close.
Lmarena.ai is a very accurate eval (with stylecontrol). Other benchmarks like AIME and whatever can be trained on/optimized for and therefore should not be trusted. Most ai companies do something fishy to boost their benchmark scores.
There is a interesting dynamic of supply and demand here. 1% is basically free for all existing use cases today.<p>BUT new use cases are now realistic. The question is how long until demand for the new use cases shows up
Hijacking this thread: what's currently the cheapest way to get structured data out of a PDF?<p>I assume there's some reasonable tool out there to convert PDFs to Markup and than feed it to some LLM API with okay costs (Gemini? DeepSeek?). Any suggestions?