TechEcho

15 comments

kenhwangabout 6 years ago

Ordered by realtime, fastest to slowest for those like me who got annoyed by the scrolling up and down trying to compare:<pre><code> Rust (1.13.0-nightly) 1m32.392s Nim (0.14.2) 1m53.320s C 1m59.116s Julia (0.4.6) 2m01.166s Crystal (0.18.7) 2m01.735s C Double Precision 2m26.546s Java (1.7.0_111) 2m36.949s Nim Double Precision (0.14.2) 3m19.547s OCaml 3m59.597s Go 1.6 6m44.151s node.js (6.2.1) 7m59.041s node.js (5.7.1) 8m49.170s C# 12m18.463s PyPy 14m02.406s Lisp 24m43.216s Haskell 26m34.955s Elixir 123m59.025s Elixir MP 138m48.241s Luajit 225m58.621s Python 348m35.965s Lua 611m38.925s</code></pre>

评论 #19752961 未加载

评论 #19751689 未加载

评论 #19751536 未加载

评论 #19765109 未加载

评论 #19751533 未加载

Someoneabout 6 years ago

This times performance like this:<pre><code> $ time ./crb </code></pre> That means time spent writing the .ppm file is included.In the implementations I browsed, that is about a million print calls, each of which might flush the output buffer, and whose performance may depend on locale.To benchmark ray tracing I would, instead, just output the sum of the pixel values, or set the exit code depending on that value.Even though ray tracing is cpu intensive, it also wouldn’t completely surprise me if some of the implementations in less mature languages spent significant time writing that output because their programmers haven’t come around to optimizing such code.

评论 #19752240 未加载

piinbinaryabout 6 years ago

The haskell version can be made >= 3x faster by making the computations non-lazy, e.g.<pre><code> -data Vector3 = Vector3 {vx::Float, vy::Float, vz::Float} deriving (Show) +data Vector3 = Vector3 {vx :: !Float, vy :: !Float, vz :: !Float} deriving (Show)</code></pre>

评论 #19752183 未加载

kyberiasabout 6 years ago

I don't think the C# time is representative. I suspect Mono is really slow here. I just ran it with VS 2015 in 1 min 24 sec.

评论 #19751431 未加载

评论 #19751045 未加载

评论 #19751157 未加载

ilitiritabout 6 years ago

Interesting that Nim is slightly faster than C it considering that it compiles down to C.

评论 #19751227 未加载

kev6168about 6 years ago

I am surprised PyPy has such a huge lead over Python.<pre><code> $ time python pyrb.py real 348m35.965s user 345m51.776s sys 0m22.880s $ time pypy pyrb.py real 14m2.406s user 13m55.292s sys 0m1.416s</code></pre>

评论 #19751473 未加载

评论 #19751194 未加载

fnord77about 6 years ago

> rustc 1.13.0-nightlywhat's an ancient version of rust. Interesting it is faster than C, though.

评论 #19751332 未加载

评论 #19752246 未加载

评论 #19751641 未加载

JoshuaScriptabout 6 years ago

You should see a performance boost in the Haskell implementation by compiling with GHC's LLVM backend[0]. Another Haskell ray tracer ran 30 % faster than the native codegen this way[1].[0]<a href="https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler/backends/llvm" rel="nofollow">https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler...</a>[1]<a href="http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html" rel="nofollow">http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-ll...</a>

azhenleyabout 6 years ago

This is awesome! More good press for Nim.There is a big variation in performance, some of which I find surprising. Do you know what exactly causes some languages to be so slow (e.g., small objects being created and garbage collected frequently)?

评论 #19752210 未加载

technologicalabout 6 years ago

Wonder how would D lang would have been placed

xiphias2about 6 years ago

Looking at the Julia implementation fast math wasn't used. In my experience it's usually worth experimenting with turning it on (also of course for the other LLVM based languages), though I understand that this benchmark tries to keep the program correct at all costs.

steveloshabout 6 years ago

I looked over the Common Lisp version at <a href="https://github.com/niofis/raybench/blob/master/lisprb.lisp" rel="nofollow">https://github.com/niofis/raybench/blob/master/lisprb.lisp</a> and it's… really bad, in a lot of ways.<pre><code> (declaim (optimize (speed 3) (safety 0) (space 0) (debug 0) (compilation-speed 0))) </code></pre> Never use `(optimize (safety 0))` in SBCL — it throws safety completely out the window. We're talking C-levels of safety at that point. Buffer overruns, the works. It might buy you 10-20% speed, but it's not worth it. Lisp responsibly, use `(safety 1)`.<pre><code> (defconstant WIDTH 1280) </code></pre> People generally name constants in CL with +plus-muffs+. Naming them as uppercase doesn't help because the reader uppercases symbol names by default when it reads. So `(defconstant WIDTH ...)` means you can no longer have a variable named `width` (in the same package).<pre><code> (defstruct (vec (:conc-name v-) (:constructor v-new (x y z)) (:type (vector float))) x y z) </code></pre> Using `:type (vector float)` here is trying to make things faster, but failing. The type designator `float` covers all kinds of floats, e.g. both `single-float`s and `double-float`s in SBCL. So all SBCL knows is that the struct contains some kind of float, and it can't really do much with that information. This means all the vector math functions below have to fall back to generic arithmetic, which is extremely slow. SBCL even warns you about this when it's compiling, thanks to the `(optimize (speed 3))` declaration, but I guess they ignored or didn't understand those warnings.<pre><code> (defconstant ZERO (v-new 0.0 0.0 0.0)) </code></pre> This will cause problems because if it's ever evaluated more than once it'll try to redefine the constant to a new `vec` instance, which will not be `eql` to the old one. Use `alexandria:define-constant` or just make it a global variable.All the vector math functions are slow because they have no useful type information to work with:<pre><code> (disassemble 'v-add) ; disassembly for V-ADD ; Size: 160 bytes. Origin: #x52D799AF ; 9AF: 488B45F8 MOV RAX, [RBP-8] ; no-arg-parsing entry point ; 9B3: 488B5001 MOV RDX, [RAX+1] ; 9B7: 488B45F0 MOV RAX, [RBP-16] ; 9BB: 488B7801 MOV RDI, [RAX+1] ; 9BF: FF1425A8001052 CALL QWORD PTR [#x521000A8] ; GENERIC-+ ; 9C6: 488955E8 MOV [RBP-24], RDX ; 9CA: 488B45F8 MOV RAX, [RBP-8] ; 9CE: 488B5009 MOV RDX, [RAX+9] ; 9D2: 488B45F0 MOV RAX, [RBP-16] ; 9D6: 488B7809 MOV RDI, [RAX+9] ; 9DA: FF1425A8001052 CALL QWORD PTR [#x521000A8] ; GENERIC-+ ; 9E1: 488BDA MOV RBX, RDX ; 9E4: 488B45F8 MOV RAX, [RBP-8] ; 9E8: 488B5011 MOV RDX, [RAX+17] ; 9EC: 488B45F0 MOV RAX, [RBP-16] ; 9F0: 488B7811 MOV RDI, [RAX+17] ; 9F4: 48895DE0 MOV [RBP-32], RBX ; 9F8: FF1425A8001052 CALL QWORD PTR [#x521000A8] ; GENERIC-+ ; 9FF: 488B5DE0 MOV RBX, [RBP-32] ; A03: 49896D40 MOV [R13+64], RBP ; thread.pseudo-atomic-bits ; A07: 498B4520 MOV RAX, [R13+32] ; thread.alloc-region ; A0B: 4C8D5830 LEA R11, [RAX+48] ; A0F: 4D3B5D28 CMP R11, [R13+40] ; A13: 772E JNBE L2 ; A15: 4D895D20 MOV [R13+32], R11 ; thread.alloc-region ; A19: L0: C600D9 MOV BYTE PTR [RAX], -39 ; A1C: C6400806 MOV BYTE PTR [RAX+8], 6 ; A20: 0C0F OR AL, 15 ; A22: 49316D40 XOR [R13+64], RBP ; thread.pseudo-atomic-bits ; A26: 7402 JEQ L1 ; A28: CC09 BREAK 9 ; pending interrupt trap ; A2A: L1: 488B4DE8 MOV RCX, [RBP-24] ; A2E: 48894801 MOV [RAX+1], RCX ; A32: 48895809 MOV [RAX+9], RBX ; A36: 48895011 MOV [RAX+17], RDX ; A3A: 488BD0 MOV RDX, RAX ; A3D: 488BE5 MOV RSP, RBP ; A40: F8 CLC ; A41: 5D POP RBP ; A42: C3 RET ; A43: L2: 6A30 PUSH 48 ; A45: FF142520001052 CALL QWORD PTR [#x52100020] ; ALLOC-TRAMP ; A4C: 58 POP RAX ; A4D: EBCA JMP L0 </code></pre> If they had done the type declarations correctly, it would look more like this:<pre><code> ; disassembly for V-ADD ; Size: 122 bytes. Origin: #x52C33A78 ; 78: F30F104A05 MOVSS XMM1, [RDX+5] ; no-arg-parsing entry point ; 7D: F30F105F05 MOVSS XMM3, [RDI+5] ; 82: F30F58D9 ADDSS XMM3, XMM1 ; 86: F30F104A0D MOVSS XMM1, [RDX+13] ; 8B: F30F10670D MOVSS XMM4, [RDI+13] ; 90: F30F58E1 ADDSS XMM4, XMM1 ; 94: F30F104A15 MOVSS XMM1, [RDX+21] ; 99: F30F105715 MOVSS XMM2, [RDI+21] ; 9E: F30F58D1 ADDSS XMM2, XMM1 ; A2: 49896D40 MOV [R13+64], RBP ; thread.pseudo-atomic-bits ; A6: 498B4520 MOV RAX, [R13+32] ; thread.alloc-region ; AA: 4C8D5820 LEA R11, [RAX+32] ; AE: 4D3B5D28 CMP R11, [R13+40] ; B2: 7734 JNBE L2 ; B4: 4D895D20 MOV [R13+32], R11 ; thread.alloc-region ; B8: L0: 66C7005903 MOV WORD PTR [RAX], 857 ; BD: 0C03 OR AL, 3 ; BF: 49316D40 XOR [R13+64], RBP ; thread.pseudo-atomic-bits ; C3: 7402 JEQ L1 ; C5: CC09 BREAK 9 ; pending interrupt trap ; C7: L1: C7400103024F50 MOV DWORD PTR [RAX+1], #x504F0203 ; #<SB-KERNEL:LAYOUT for VEC {504F0203}> ; CE: F30F115805 MOVSS [RAX+5], XMM3 ; D3: F30F11600D MOVSS [RAX+13], XMM4 ; D8: F30F115015 MOVSS [RAX+21], XMM2 ; DD: 488BD0 MOV RDX, RAX ; E0: 488BE5 MOV RSP, RBP ; E3: F8 CLC ; E4: 5D POP RBP ; E5: C3 RET ; E6: CC0F BREAK 15 ; Invalid argument count trap ; E8: L2: 6A20 PUSH 32 ; EA: E8F1C64CFF CALL #x521001E0 ; ALLOC-TRAMP ; EF: 58 POP RAX ; F0: EBC6 JMP L0 </code></pre> The weirdness continues:<pre><code> (defstruct (ray (:conc-name ray-) (:constructor ray-new (origin direction)) (:type vector)) origin direction) </code></pre> The `:conc-name ray-` is useless, that's the default conc-name. And again with the `:type vector`… just make it a normal struct. I was going to guess that they were doing it so they could use vector literals to specify the objects, but then why are they bothering to define a BOA constructor here? And the slots are untyped, which, if you're looking for speed, is not doing you any favors.I took a few minutes over lunch to add some type declarations to the slots and important functions, inlined the math, cleaned up the broken indentation and naming issues:<a href="https://gist.github.com/sjl/005f27274adacd12ea2fc7f0b7200b80/revisions?diff=split#diff-48e2da69300a7d7516647faf76fc0e20" rel="nofollow">https://gist.github.com/sjl/005f27274adacd12ea2fc7f0b7200b80...</a>The old version runs in 5m12s on my laptop, the new version runs in 58s. So if we unscientifically extrapolate that to their 24m time, it puts it somewhere around 5m in their list. This matches what I usually see from SBCL: for numeric-heavy code generic arithmetic is very slow, and some judicious use of type declarations can get you to within ~5-10x of C. Getting more improvements beyond that can require really bonkers stuff that often isn't worth it.

评论 #19753531 未加载

PorterDuffabout 6 years ago

It would be interesting to take that C version and hammer on it a bit for speed....and then add SIMD.

omarantoabout 6 years ago

The results are more or less in line with what I would have expected, except for SBCL and Luajit, which I would have expected to be much faster.

评论 #19753550 未加载

iainmerrickabout 6 years ago

The most impressive result here is Lua -- not far behind C! LuaJIT is amazing.Good to see a few languages like Nim and Rust actually beating C for raw performance, too.

评论 #19751426 未加载

评论 #19751504 未加载

评论 #19751567 未加载

15 comments

kenhwangabout 6 years ago

评论 #19752961 未加载

评论 #19751689 未加载

评论 #19751536 未加载

评论 #19765109 未加载

评论 #19751533 未加载

Someoneabout 6 years ago

评论 #19752240 未加载

piinbinaryabout 6 years ago

评论 #19752183 未加载

kyberiasabout 6 years ago

I don't think the C# time is representative. I suspect Mono is really slow here. I just ran it with VS 2015 in 1 min 24 sec.

评论 #19751431 未加载

评论 #19751045 未加载

评论 #19751157 未加载

ilitiritabout 6 years ago

Interesting that Nim is slightly faster than C it considering that it compiles down to C.

评论 #19751227 未加载

kev6168about 6 years ago

评论 #19751473 未加载

评论 #19751194 未加载

fnord77about 6 years ago

> rustc 1.13.0-nightlywhat's an ancient version of rust. Interesting it is faster than C, though.

评论 #19751332 未加载

评论 #19752246 未加载

评论 #19751641 未加载

JoshuaScriptabout 6 years ago

azhenleyabout 6 years ago

评论 #19752210 未加载

technologicalabout 6 years ago

Wonder how would D lang would have been placed

xiphias2about 6 years ago

steveloshabout 6 years ago

评论 #19753531 未加载

PorterDuffabout 6 years ago

It would be interesting to take that C version and hammer on it a bit for speed....and then add SIMD.

omarantoabout 6 years ago

The results are more or less in line with what I would have expected, except for SBCL and Luajit, which I would have expected to be much faster.

评论 #19753550 未加载

iainmerrickabout 6 years ago

The most impressive result here is Lua -- not far behind C! LuaJIT is amazing.Good to see a few languages like Nim and Rust actually beating C for raw performance, too.

评论 #19751426 未加载

评论 #19751504 未加载

评论 #19751567 未加载

Programming language ray tracing benchmarks project

15 comments

Programming language ray tracing benchmarks project

15 comments