科技回声

4 条评论

jasin大约 12 年前

A couple of comments/elaborations on the "core differences" mentioned in the article:<p>The first difference mentioned is that whereas the first SSE2 implementations were often implemented using 64-bit ALUs internally, yielding roughly the same performance as doing two equivalent MMX ops manually, this isn't the case with AVX2. However, it may be worth noting, that it largely _is_ the case with the current AVX ("AVX1", i.e. pre-Haswell) implementations.<p>The second cited difference is that there's a 128-bit "boundary" in many of the operations. This is effectively what can throw down the drain the hopes of getting 2x gains over SSE2 just by naïvely migrating into AVX2. For instance, you cannot do shuffles to/from arbitrary components anymore, but have to consider the 128-bit lane boundaries instead.<p>The third issue, i.e. data layouts of internal formats and the assumptions of various algorithms are probably the most significant factors that determine how large a benefit you are going to get. Typically the internal data layouts (i.e. is my pixel block size 2x2, 4x4, 16x8 or something else?) are married with the ISA. Thus, when migrating from one instruction set to another, these typically may need to be reconsidered if speed is paramount. Interestingly enough, this means that when the ISA changes, you most likely want to do some higher-level algorithmic optimizations as well.

lmm大约 12 年前

Anyone have a non-scribd copy?

评论 #5598560 未加载

评论 #5598354 未加载

Osiris大约 12 年前

Are there binary builds available with AVX2 support compiled in for testing? I'm curious if FMA(3/4) support available in AMD processors would increase performance. A quick Google search shows that there are some patches available for FMA support.

评论 #5598201 未加载

评论 #5598207 未加载

zobzu大约 12 年前

Nice gains. Thanks for the writeup and explanations!

4 条评论

jasin大约 12 年前

lmm大约 12 年前

Anyone have a non-scribd copy?

评论 #5598560 未加载

评论 #5598354 未加载

Osiris大约 12 年前

评论 #5598201 未加载

评论 #5598207 未加载

zobzu大约 12 年前

Nice gains. Thanks for the writeup and explanations!

Introduction to AVX2 optimizations in x264

4 条评论

Introduction to AVX2 optimizations in x264

4 条评论