TechEcho

6 comments

boulos9 months ago

This seems to keep coming up, and I see confusion in the comments. There is a standard: IEEE 754-2008. There are additional things people add like approximate reciprocals and approximate sqrt. But if you don't use those, and you don't make an association error, you get consistent results.The question here with association for summation is what you want to match. OP chose to match the scalar for-loop equivalent. You can just as easily make an 8-wide or 16-wide "virtual vector" and use that instead.I suspect that an 8-wide virtual vector is the right default for people currently, since systems since Haswell support it, all recent AMD, and if you're using vectorization, you can afford to pay some overhead on Arm with a double-width virtual vector. You don't often gain enough from AVX512 to make the default 16-wide, but if you wanted to focus on Skylake+ (really Cascadelake+) or Genoa+ systems, it would be a fine choice.

评论 #41439593 未加载

评论 #41441183 未加载

kardos9 months ago

Exact floating point accumulating is more or less solved with xsum [1] -- would it work in this context?[1] <a href="https://gitlab.com/radfordneal/xsum" rel="nofollow">https://gitlab.com/radfordneal/xsum</a>

评论 #41441127 未加载

waynecochran9 months ago

Invariance w floating point arithmetic seems like a fool's errand. If the numbers one is working with are roughly on the same order of magnitude than I would consider integer / fixed point instead. You get the same results in this case (as long as you are careful).

someguydave9 months ago

Seems crazy to try to paper over hardware implementation differences in software. Some org should be standardizing floating point intrinsics

评论 #41442445 未加载

baq9 months ago

see also streflop (2006)<a href="https://nicolas.brodu.net/en/programmation/streflop/" rel="nofollow">https://nicolas.brodu.net/en/programmation/streflop/</a>

modulovalue9 months ago

I'm still wondering if there could exist an alternative world where efficient addition over decimal numbers that we developers use on a day to day basis is associative. Is that even possible or is there perhaps some fundamental limit that forces us to trade associativity for performance?It seems to me that non associative floating point operations force us into a local maximum. The operation itself might be efficient on modern machines, but could it be preventing us from applying other important high level optimizations to our programs due to its lack of associativity? A richer algebraic structure should always be amenable to a richer set of potential optimizations.---I've asked a question that is very much related to that topic on the programming language subreddit:"Could numerical operations be optimized by using algebraic properties that are not present in floating point operations but in numbers that have infinite precision?"<a href="https://www.reddit.com/r/ProgrammingLanguages/comments/145kps7/could_numerical_operations_be_optimized_by_using/" rel="nofollow">https://www.reddit.com/r/ProgrammingLanguages/comments/145kp...</a>The responses there might be interesting to some people here.

评论 #41434840 未加载

评论 #41440081 未加载

评论 #41435340 未加载

评论 #41435084 未加载

6 comments

boulos9 months ago

评论 #41439593 未加载

评论 #41441183 未加载

kardos9 months ago

评论 #41441127 未加载

waynecochran9 months ago

someguydave9 months ago

Seems crazy to try to paper over hardware implementation differences in software. Some org should be standardizing floating point intrinsics

评论 #41442445 未加载

baq9 months ago

see also streflop (2006)<a href="https://nicolas.brodu.net/en/programmation/streflop/" rel="nofollow">https://nicolas.brodu.net/en/programmation/streflop/</a>

Creating invariant floating-point accumulators

6 comments

Creating invariant floating-point accumulators

6 comments