TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Intel's take on GCC's memcpy implementation

76 pointsby mtdevover 13 years ago

6 comments

wolf550eover 13 years ago
This article is old: March 9, 2009 1:00 AM PDT<p>Nowadays glibc has modern SSE code and the kernel uses "rep movsb". The kernel can store and restore FPU state if the copy is long and doing SSE/AVX is worth it. Someone on the Linux kernel mailing list measured that performance depends on src and dest being 64-byte aligned compared to each other: if they are aligned, "rep movsb" is faster than SSE.<p>The thread: <a href="https://lkml.org/lkml/2011/9/1/229" rel="nofollow">https://lkml.org/lkml/2011/9/1/229</a><p><a href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=arch/x86/lib/memcpy_64.S;hb=HEAD" rel="nofollow">http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git...</a><p><a href="http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcpy-ssse3.S;hb=HEAD" rel="nofollow">http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_...</a>
abrahamsenover 13 years ago
&#62; the developer communications don't appear on a public list. There is no visible public help forum or mail list<p><a href="http://dir.gmane.org/index.php?prefix=gmane.comp.lib.glibc" rel="nofollow">http://dir.gmane.org/index.php?prefix=gmane.comp.lib.glibc</a><p>Seems public to me.
评论 #3191995 未加载
shin_laoover 13 years ago
A couple of years ago, before SSE existed, I wrote a highly optimized memory copy routine. It was more than just using movntq (non temporal is important to avoid cache pollution) and the like, for large data I copied the chunks in a local buffer less than one page size and copied it to the destination. Sounds crazy? It actually was much faster because of page locality.<p>For small chunks however, nothing was faster than rep movsb which moves one byte at the time.
memsetover 13 years ago
Someone tell me if I am mistaken - but it looks like the main difference between GCC's and Intel's memcpy() boils down to gcc using `rep movsl` and icc using `movdqa`, the latter having a shorter decode time and possibly shorter execution time?
评论 #3190211 未加载
评论 #3190528 未加载
JoeAltmaierover 13 years ago
I'm sad that computers in this modern age still require me to be in their business. Doesn't it seem like the cpu's own business to move bytes efficiently? Why is the compiler, much less the programmer, involved? The tests being made in the compiler/lib are of factors better-known at runtime (overlap, size, alignment) and better handled by microcode.
评论 #3194438 未加载
vz0over 13 years ago
Anger Fog found this issue one year earlier, 2008:<p><a href="http://www.cygwin.com/ml/libc-help/2008-08/msg00007.html" rel="nofollow">http://www.cygwin.com/ml/libc-help/2008-08/msg00007.html</a>