TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Intel's take on GCC's memcpy implementation

76 点作者 mtdev超过 13 年前

6 条评论

wolf550e超过 13 年前
This article is old: March 9, 2009 1:00 AM PDT<p>Nowadays glibc has modern SSE code and the kernel uses "rep movsb". The kernel can store and restore FPU state if the copy is long and doing SSE/AVX is worth it. Someone on the Linux kernel mailing list measured that performance depends on src and dest being 64-byte aligned compared to each other: if they are aligned, "rep movsb" is faster than SSE.<p>The thread: <a href="https://lkml.org/lkml/2011/9/1/229" rel="nofollow">https://lkml.org/lkml/2011/9/1/229</a><p><a href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=arch/x86/lib/memcpy_64.S;hb=HEAD" rel="nofollow">http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git...</a><p><a href="http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcpy-ssse3.S;hb=HEAD" rel="nofollow">http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_...</a>
abrahamsen超过 13 年前
&#62; the developer communications don't appear on a public list. There is no visible public help forum or mail list<p><a href="http://dir.gmane.org/index.php?prefix=gmane.comp.lib.glibc" rel="nofollow">http://dir.gmane.org/index.php?prefix=gmane.comp.lib.glibc</a><p>Seems public to me.
评论 #3191995 未加载
shin_lao超过 13 年前
A couple of years ago, before SSE existed, I wrote a highly optimized memory copy routine. It was more than just using movntq (non temporal is important to avoid cache pollution) and the like, for large data I copied the chunks in a local buffer less than one page size and copied it to the destination. Sounds crazy? It actually was much faster because of page locality.<p>For small chunks however, nothing was faster than rep movsb which moves one byte at the time.
memset超过 13 年前
Someone tell me if I am mistaken - but it looks like the main difference between GCC's and Intel's memcpy() boils down to gcc using `rep movsl` and icc using `movdqa`, the latter having a shorter decode time and possibly shorter execution time?
评论 #3190211 未加载
评论 #3190528 未加载
JoeAltmaier超过 13 年前
I'm sad that computers in this modern age still require me to be in their business. Doesn't it seem like the cpu's own business to move bytes efficiently? Why is the compiler, much less the programmer, involved? The tests being made in the compiler/lib are of factors better-known at runtime (overlap, size, alignment) and better handled by microcode.
评论 #3194438 未加载
vz0超过 13 年前
Anger Fog found this issue one year earlier, 2008:<p><a href="http://www.cygwin.com/ml/libc-help/2008-08/msg00007.html" rel="nofollow">http://www.cygwin.com/ml/libc-help/2008-08/msg00007.html</a>