TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

SIMD in Pure Python

256 pointsby dmartoover 1 year ago

8 comments

ashvardanianover 1 year ago
This is a nice exercise!<p>There is also a very different “write SIMD assembly in Python” approach available through the PeachPy library, one of the least known gems between Python and HPC worlds: <a href="https:&#x2F;&#x2F;github.com&#x2F;Maratyszcza&#x2F;PeachPy">https:&#x2F;&#x2F;github.com&#x2F;Maratyszcza&#x2F;PeachPy</a><p>This is what a dot-product would look like in PeachPy: <a href="https:&#x2F;&#x2F;unum-cloud.github.io&#x2F;usearch&#x2F;python&#x2F;index.html#id4" rel="nofollow">https:&#x2F;&#x2F;unum-cloud.github.io&#x2F;usearch&#x2F;python&#x2F;index.html#id4</a><p>PS: Cppyy and Numba are also fun to use in such projects :)
评论 #38876336 未加载
评论 #38877738 未加载
justinl33over 1 year ago
Genius!<p>&gt;* the state of the cells is being stored in a big array, accessed via the get_cell and set_cell helper functions. What if instead of using an array, we stored the whole state in one very long integer, and used SWAB arithmetic to process the whole thing at once?*<p>I’m really curious as to how the unpacking of this long integer into pixels on the screen doesn’t add more overhead than it saves. I guess I’ll have to wait for your next one on the compressed gzip stream hack.
评论 #38875983 未加载
nneonneoover 1 year ago
Neat trick! I implemented a similar bitpacking approach for solving matrix equations in GF(2), which can be used for things like forging CRC hashes (faster than bruteforce) and solving certain cryptography problems. Code is here: <a href="https:&#x2F;&#x2F;github.com&#x2F;nneonneo&#x2F;pwn-stuff&#x2F;blob&#x2F;master&#x2F;math&#x2F;gf2.py">https:&#x2F;&#x2F;github.com&#x2F;nneonneo&#x2F;pwn-stuff&#x2F;blob&#x2F;master&#x2F;math&#x2F;gf2.p...</a>.
评论 #38876056 未加载
dinklebergover 1 year ago
This is great, nice work!<p>I just recently started going through a performance programming course (<a href="https:&#x2F;&#x2F;computerenhance.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;computerenhance.com&#x2F;</a>) and have learned about SIMD and other techniques and it is awesome to see something out in the wild.
redskyluanover 1 year ago
Just stumbled upon this blog it&#x27;s absolutely intriguing! As a Python enthusiast, it&#x27;s like finding a hidden treasure that challenges the usual norms of Python&#x27;s capabilities. Thinks of writing a Pure python implementation of some ml algos in learning SIMD~
jvansover 1 year ago
Exercises like this really make you a better programmer. Someone should collect these types of examples in a git repo somewhere.
jxyover 1 year ago
tl;dr<p>&gt; The general term for this concept is SWAR, which stands for SIMD Within A Register. But here, rather than using a machine register, we&#x27;re using an arbitrarily long Python integer. I&#x27;m calling this variant SWAB: SIMD Within A Bigint.<p>Thanks to Peano and Godel, it&#x27;s safe to say we may encode any compute with operations on natural numbers. So if anything is slow in Python for you, you may always encode it in Bigint and hope for the best.
akasakahakadaover 1 year ago
np.array([ BigInt list ], dtype=object)<p>then you can apply this bit parallelism to a tensor.