TechEcho

10 comments

sirabenalmost 5 years ago

Optimizing Haskell for performance can be tricky. When I wrote a functional image library (updated at [0]) I tried to get the frame draw rate down by using strictness, unpacked records, and so on, but one of the changes that made the biggest difference came from changing round to truncate[1], I went from 120 ms/frame to 65 ms/frame! The reason why is because round has quite a bit of logic associated with it[2] in Haskell's standard library, versus a simple one-liner for truncate.It sort of shows that in even simple functions such as this, underlying implementation details can have an impact on performance.Also, for some reason filling a C array in Haskell was faster than doing it in C, probably due to the switching between RTS.[0] <a href="https://github.com/siraben/functional-images" rel="nofollow">https://github.com/siraben/functional-images</a>[1] <a href="https://github.com/sbond75/OpenGLTesting/commit/116384edd6a36d7cb828310fcc8bfbf6327cc47d" rel="nofollow">https://github.com/sbond75/OpenGLTesting/commit/116384edd6a3...</a>[2] <a href="https://hackage.haskell.org/package/base-4.14.0.0/docs/src/GHC.Real.html#truncate" rel="nofollow">https://hackage.haskell.org/package/base-4.14.0.0/docs/src/G...</a>

评论 #23886328 未加载

jose_zapalmost 5 years ago

I find this article a nice counterpoint to “it is difficult to reason about performance in Haskell”. A person who had never programmed in the language, found all the tools and resources to implement an algorithm from a reference and make it fast.

评论 #23888125 未加载

ameliusalmost 5 years ago

I certainly didn't expect it to be faster than the Rust implementation. Quite a surprise there.

评论 #23888709 未加载

评论 #23886582 未加载

评论 #23889521 未加载

centimeteralmost 5 years ago

GHC has to walk a tight line between A) applying enough optimization passes to reliably remove abstraction overhead and B) compiling fast enough that people won't complain about it.It's pretty awesome what GHC can pull off in terms of reducing complex programs to a minimal normal form (you can see this with e.g. <a href="https://hackage.haskell.org/package/ghc-proofs-0.1.1/docs/GHC-Proof.html" rel="nofollow">https://hackage.haskell.org/package/ghc-proofs-0.1.1/docs/GH...</a> ), but often times you do have to convince the optimizer to be more aggressive than it wants to be.

评论 #23890661 未加载

platzalmost 5 years ago

I'm not sure it was his first Haskell program, if it was it's pretty impressive that he used INLINE, understood minute differences between Random libraries, profiling, threadscope, etc..

评论 #23888262 未加载

muglugalmost 5 years ago

I love how the author provides each optimising commit diff – that really helped me understand the improvements.

ncmncmalmost 5 years ago

The hardest thing about optimizing is knowing whether something is fast. It helps if you have the another of the same thing that is already fast, or seems so. When you match it, are you now fast?You are until somebody makes one faster. Then, you're slow again. That is our deal with the Devil: our machines are filled with dodgy gimcracks that often make our program faster, but we can hardly ever know whether it as as fast as it could be, or if some gimcrack that would make it faster hasn't engaged yet; or, often, even how to go about engaging it.Usually we stop optimizing when it seems fast enough, usually because something else is (still) slower.It turns out to be pretty simple to make quicksort go twice as fast on current hardware. Standard libraries and compilers generally don't do those things, often out of fear of making a few programs slower than before. People complain a lot more about a slower program than they thank you for making a hundred others faster--if they even notice the latter.

CyberDildonicsalmost 5 years ago

It's interesting that half way through the optimizations haskell is doing 3.3 terabytes of total heap allocation.When it gets to 93 seconds it is doing 208 GB of heap allocation and in the final version it is still doing over 20 GB of heap allocations. This seems like quite a bit for a 384x216 image.

评论 #23890230 未加载

logicchainsalmost 5 years ago

I wonder if the author would have achieved similar results just by enabling the -XStrict language extension from the start.

评论 #23888289 未加载

Asookaalmost 5 years ago

That's really cool. The next step would be using explicit SIMD instructions. Is there a SIMD library for Haskell? Or a library for automatically doing AoS->SoA transformation on some compute code?

10 comments

sirabenalmost 5 years ago

评论 #23886328 未加载

jose_zapalmost 5 years ago

评论 #23888125 未加载

ameliusalmost 5 years ago

I certainly didn't expect it to be faster than the Rust implementation. Quite a surprise there.

评论 #23888709 未加载

评论 #23886582 未加载

评论 #23889521 未加载

centimeteralmost 5 years ago

评论 #23890661 未加载

platzalmost 5 years ago

I'm not sure it was his first Haskell program, if it was it's pretty impressive that he used INLINE, understood minute differences between Random libraries, profiling, threadscope, etc..

评论 #23888262 未加载

muglugalmost 5 years ago

I love how the author provides each optimising commit diff – that really helped me understand the improvements.

ncmncmalmost 5 years ago

CyberDildonicsalmost 5 years ago

评论 #23890230 未加载

logicchainsalmost 5 years ago

I wonder if the author would have achieved similar results just by enabling the -XStrict language extension from the start.

评论 #23888289 未加载

Asookaalmost 5 years ago

That's really cool. The next step would be using explicit SIMD instructions. Is there a SIMD library for Haskell? Or a library for automatically doing AoS->SoA transformation on some compute code?

Optimizing Ray Tracing in Haskell

10 comments

Optimizing Ray Tracing in Haskell

10 comments