TechEcho

8 comments

This is a nice exercise!There is also a very different “write SIMD assembly in Python” approach available through the PeachPy library, one of the least known gems between Python and HPC worlds: <a href="https://github.com/Maratyszcza/PeachPy">https://github.com/Maratyszcza/PeachPy</a>This is what a dot-product would look like in PeachPy: <a href="https://unum-cloud.github.io/usearch/python/index.html#id4" rel="nofollow">https://unum-cloud.github.io/usearch/python/index.html#id4</a>PS: Cppyy and Numba are also fun to use in such projects :)

评论 #38876336 未加载

评论 #38877738 未加载

justinl33over 1 year ago

Genius!>* the state of the cells is being stored in a big array, accessed via the get_cell and set_cell helper functions. What if instead of using an array, we stored the whole state in one very long integer, and used SWAB arithmetic to process the whole thing at once?*I’m really curious as to how the unpacking of this long integer into pixels on the screen doesn’t add more overhead than it saves. I guess I’ll have to wait for your next one on the compressed gzip stream hack.

评论 #38875983 未加载

nneonneoover 1 year ago

Neat trick! I implemented a similar bitpacking approach for solving matrix equations in GF(2), which can be used for things like forging CRC hashes (faster than bruteforce) and solving certain cryptography problems. Code is here: <a href="https://github.com/nneonneo/pwn-stuff/blob/master/math/gf2.py">https://github.com/nneonneo/pwn-stuff/blob/master/math/gf2.p...</a>.

评论 #38876056 未加载

dinklebergover 1 year ago

This is great, nice work!I just recently started going through a performance programming course (<a href="https://computerenhance.com/" rel="nofollow">https://computerenhance.com/</a>) and have learned about SIMD and other techniques and it is awesome to see something out in the wild.

redskyluanover 1 year ago

Just stumbled upon this blog it's absolutely intriguing! As a Python enthusiast, it's like finding a hidden treasure that challenges the usual norms of Python's capabilities. Thinks of writing a Pure python implementation of some ml algos in learning SIMD~

jvansover 1 year ago

Exercises like this really make you a better programmer. Someone should collect these types of examples in a git repo somewhere.

jxyover 1 year ago

tl;dr> The general term for this concept is SWAR, which stands for SIMD Within A Register. But here, rather than using a machine register, we're using an arbitrarily long Python integer. I'm calling this variant SWAB: SIMD Within A Bigint.Thanks to Peano and Godel, it's safe to say we may encode any compute with operations on natural numbers. So if anything is slow in Python for you, you may always encode it in Bigint and hope for the best.

akasakahakadaover 1 year ago

np.array([ BigInt list ], dtype=object)then you can apply this bit parallelism to a tensor.

8 comments

ashvardanianover 1 year ago

评论 #38876336 未加载

评论 #38877738 未加载

justinl33over 1 year ago

评论 #38875983 未加载

nneonneoover 1 year ago

评论 #38876056 未加载

dinklebergover 1 year ago

redskyluanover 1 year ago

jvansover 1 year ago

Exercises like this really make you a better programmer. Someone should collect these types of examples in a git repo somewhere.

jxyover 1 year ago

akasakahakadaover 1 year ago

np.array([ BigInt list ], dtype=object)then you can apply this bit parallelism to a tensor.

SIMD in Pure Python

8 comments

SIMD in Pure Python

8 comments