TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Rust std fs slower than Python? No, it's hardware

687 pointsby Pop_-over 1 year ago

28 comments

the8472over 1 year ago
There are two dedicated CPU feature flags to indicate that REP STOS&#x2F;MOV are fast and usable as short instruction sequence for memset&#x2F;memcpy. Having to hand-roll optimized routines for each new CPU generation has been an ongoing pain for decades.<p>And yet here we are again. Shouldn&#x27;t this be part of some timing testsuite of CPU vendors by now?
评论 #38471680 未加载
评论 #38462455 未加载
评论 #38467366 未加载
Aissenover 1 year ago
Associated glibc bug (Zen 4 though): <a href="https:&#x2F;&#x2F;sourceware.org&#x2F;bugzilla&#x2F;show_bug.cgi?id=30994" rel="nofollow noreferrer">https:&#x2F;&#x2F;sourceware.org&#x2F;bugzilla&#x2F;show_bug.cgi?id=30994</a>
评论 #38462707 未加载
评论 #38461986 未加载
royjacobsover 1 year ago
I was prepared to read the article and scoff at the author&#x27;s misuse of std::fs. However, the article is a delightful succession of rabbit holes and mysteries. Well written and very interesting!
评论 #38460958 未加载
评论 #38461893 未加载
评论 #38460649 未加载
quietbritishjimover 1 year ago
I&#x27;m a bit confused about the premise. This is not comparing pure Python code against some native (C or Rust) code. It&#x27;s comparing one Python wrapper around native code (Python&#x27;s file read method) against another Python wrapper around some native code (OpenDAL). OK it&#x27;s still interesting that there&#x27;s a difference in performance, but it&#x27;s very odd to describe it as &quot;slower than Python&quot;. Did they expect that the Python standard library is all written in pure Python? On the contrary, I would expect the implementations of functions in Python&#x27;s standard library to be native and, individually, highly optimised.<p>I&#x27;m not surprised the conclusion had something to do with the way that native code works. Admittedly I was surprised at the specific answer - still a very interesting article despite the confusing start.<p>Edit: The conclusion also took me a couple of attempts to parse. There&#x27;s a heading &quot;C is slower than Python with specified offset&quot;. To me, as a native English speaker, this reads as &quot;C is slower (than Python) with specified offset&quot; i.e. it sounds like they took the C code, specified the same offset as Python, and then it&#x27;s still slower than Python. But it&#x27;s the opposite: once the offset from Python was also specified in the C code, the C code was then faster. Still very interesting once I got what they were saying though.
评论 #38466106 未加载
评论 #38462251 未加载
评论 #38461311 未加载
评论 #38459136 未加载
评论 #38465585 未加载
评论 #38471075 未加载
fsniperover 1 year ago
The article itself is a great read and it has fascinating info related to this issue.<p>However I am more interested&#x2F;concerned about another part. How the issue is reported&#x2F;recorded and how the communications are handled.<p>Reporting is done over discord, which is a proprietary environment which is not indexed, or searchable. Will not be archived.<p>Communications and deliberations are done over discord and telegram, which is probably worse than discord in this context.<p>This blog post and the github repository is the lingering remains of them. If Xuanwo did not blog this. It would be lost in timeline.<p>Isn&#x27;t this fascinating?
评论 #38472732 未加载
评论 #38466918 未加载
iampimsover 1 year ago
Most interesting article I&#x27;ve read this week. Excellent write-up.
londons_exploreover 1 year ago
So the obvious thing to do... Send a patch to change the &quot;copy_user_generic&quot; kernel method to use a different memory copying implementation when the CPU is detected to be a bad one and the memory alignment is one that triggers the slowness bug...
评论 #38460838 未加载
评论 #38467126 未加载
comonoidover 1 year ago
jemalloc was Rust&#x27;s default allocator till 2018.<p><a href="https:&#x2F;&#x2F;internals.rust-lang.org&#x2F;t&#x2F;jemalloc-was-just-removed-from-the-standard-library&#x2F;8759" rel="nofollow noreferrer">https:&#x2F;&#x2F;internals.rust-lang.org&#x2F;t&#x2F;jemalloc-was-just-removed-...</a>
评论 #38463341 未加载
a1oover 1 year ago
&gt; Rust developers might consider switching to jemallocator for improved performance<p>I am curious if this is something that everyone can do to get free performance or if there are caveats. Can C codebases benefit from this too? Is this performance that is simply left on table currently?
评论 #38461174 未加载
评论 #38459601 未加载
评论 #38459624 未加载
评论 #38470053 未加载
评论 #38467213 未加载
评论 #38462306 未加载
评论 #38461811 未加载
评论 #38462579 未加载
评论 #38463362 未加载
amlutoover 1 year ago
I sent this to the right people.
评论 #38467299 未加载
diamondlovesyouover 1 year ago
AMD&#x27;s string store is not like Intel&#x27;s. Generally, you don&#x27;t want to use it until you are past the CPU&#x27;s L2 size (L3 is a victim cache), making ~2k WAY too small. Once past that point, it&#x27;s profitable to use string store, and should run at &quot;DRAM speed&quot;. But it has a high startup cost, hence 256bit vector loads&#x2F;stores should be used until that threshold is met.
评论 #38462302 未加载
评论 #38462202 未加载
collinmandersonover 1 year ago
BTW, I&#x27;ve always thought Python uses way too many syscalls when working with files. Simple code like this uses something like 9 syscalls (shown in the article):<p><pre><code> with open(&#x27;myfile&#x27;) as f: data = f.read() </code></pre> I&#x27;m not much of a C programmer myself. but I at least reported part of the issue to Python: <a href="https:&#x2F;&#x2F;bugs.python.org&#x2F;issue45944" rel="nofollow noreferrer">https:&#x2F;&#x2F;bugs.python.org&#x2F;issue45944</a><p>This is the fastest way to read a file on python that I&#x27;ve found, using only 3-4 syscalls (though os.fstat() doesn&#x27;t work for some special files kernel files like those in &#x2F;proc&#x2F; and &#x2F;dev&#x2F;):<p><pre><code> def read_file(path: str, size=-1) -&gt; bytes: fd = os.open(path, os.O_RDONLY) try: if size == -1: size = os.fstat(fd).st_size return os.read(fd, size) finally: os.close(fd)</code></pre>
评论 #38480049 未加载
forrestthewoodsover 1 year ago
Delightful article. Thank you author for sharing! I felt like I experienced every shock twist in surprise in your journey like I was right there with you all along.
Pesthufover 1 year ago
Clickbait headline, but the article is great!
评论 #38462023 未加载
评论 #38461687 未加载
fulafelover 1 year ago
A related thing from times when it was common that memory layout artifacts had high impact on sw performance: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Cache_coloring" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Cache_coloring</a>
codedokodeover 1 year ago
Why is there need to move memory? Hardware cannot DMA data into non-page-aligned memory? Or Linux doesn&#x27;t want to load non-aligned data?
评论 #38464738 未加载
titaniumtownover 1 year ago
Extremely well written article! Very surprising outcome.
eigenformover 1 year ago
would be lovely if ${cpu_vendor} would document exactly how FSRM&#x2F;ERMS&#x2F;etc are implemented and what the expected behavior is
评论 #38467309 未加载
lxeover 1 year ago
I wonder what other things we can improve by removing spectre mitigations and tuning hugepage, syscall altency, and core affinity
评论 #38467321 未加载
Pop_-over 1 year ago
Disclaimer: The title has been changed to &quot;Rust std fs slower than Python!? No, it&#x27;s hardware!&quot; to avoid clickbait. However I&#x27;m not able to fix the title in HN.
评论 #38460323 未加载
评论 #38460722 未加载
评论 #38460707 未加载
评论 #38459663 未加载
pmontraover 1 year ago
&gt; However, mmap has other uses too. It&#x27;s commonly used to allocate large regions of memory for applications.<p>Slack is allocating 1132 GB of virtual memory on my laptop right now. I don&#x27;t know if they are using mmap but that&#x27;s 1100 GB more than the physical memory.
评论 #38461106 未加载
评论 #38460479 未加载
评论 #38460535 未加载
评论 #38461022 未加载
explodingwaffleover 1 year ago
Anyone else feeling the frequency illusion with rep movsb?<p>(<a href="https:&#x2F;&#x2F;lock.cmpxchg8b.com&#x2F;reptar.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;lock.cmpxchg8b.com&#x2F;reptar.html</a>)
评论 #38467336 未加载
评论 #38461901 未加载
sgiftover 1 year ago
Either the author changed the headline to something less clickbaity in the meantime or you edited it for clickbait Pop_- (in that case: shame on you) - current headline: &quot;Rust std fs slower than Python!? No, it&#x27;s hardware!&quot;
评论 #38458607 未加载
评论 #38458616 未加载
评论 #38458668 未加载
评论 #38461039 未加载
darkwaterover 1 year ago
Totally unrelated but: this post talks about the bug being first discovered in OpenDAL [1], which seems to be an Apache (Incubator) project to add an abstraction layer for storage over several types of storage backend. What&#x27;s the point&#x2F;use case of such an abstraction? Anybody using it?<p>[1] <a href="https:&#x2F;&#x2F;opendal.apache.org&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;opendal.apache.org&#x2F;</a>
评论 #38462053 未加载
exxosover 1 year ago
It&#x27;s the hardware. Of course Rust remains the fastest and safest language and you must rewrite your applications in Rust.
评论 #38463292 未加载
lxeover 1 year ago
So Python isn&#x27;t affected by the bug because pymalloc performs better on buggy CPUs than jemalloc or malloc?
评论 #38466445 未加载
jokethrowawayover 1 year ago
Clickbait title but interesting article.<p>This has nothing to do with python or rust
drtghover 1 year ago
&gt;Rust std fs slower than Python!? No, it&#x27;s hardware!<p>&gt;...<p>&gt;Python features three memory domains, each representing different allocation strategies and optimized for various purposes.<p>&gt;...<p>&gt;Rust is slower than Python only on my machine.<p>if one library performs wildly better than the other in the same test, on the same hardware, how can that not be a software-related problem? sounds like a contradiction.<p>Maybe should be considered a coding issue and&#x2F;or feature absent? IMHO it would be expected Rust&#x27;s std library perform well without making all the users to circumvent the issue manually.<p>The article is well investigated so I assume the author just want to show the problem existence without creating controversy because other way I can not understand.
评论 #38459176 未加载
评论 #38459391 未加载