Apparently, someone solved it and achieved an 1187:1 compression ratio. These are the results:<p>All recordings were successfully compressed.
Original size (bytes): 146,800,526
Compressed size (bytes): 123,624
Compression ratio: 1187.47<p>The eval.sh script was downloaded, and the files were decode and encode without loss, as verified using the "diff" function.<p>What do you think? Is this true?<p><a href="https://www.linkedin.com/pulse/neuralink-compression-challenge-cspiral-31pae/" rel="nofollow">https://www.linkedin.com/pulse/neuralink-compression-challen...</a>
context: <a href="https://www.youtube.com/watch?v=X5hsQ6zbKIo" rel="nofollow">https://www.youtube.com/watch?v=X5hsQ6zbKIo</a>
Analyzing the data it becomes clear that the A/D used by Neuralink is defective, i.e. very poor accuracy. The A/D introduces a huge amount of distortion, which in practice manifests as noise.<p>Until this A/D linearity problem is fixed, there is no point pursuing compression schemes. The data is so badly mangled it makes it pretty near impossible to find patterns.
they're looking for a compressor that can do more than 200MB/s on a 10mW machine (that's including radio, so it has to run on a CPU clocked like original 8086) and yield 200x size improvement. speaking from the perspective of a data compression person, this is completely unrealistic. the best statistical models that i have on hand yield ~7x compression ratio after some tweaking, but they won't run under these constraints.
200X is possible.<p>The sample data compresses poorly, getting down to 4.5 bits per sample easily with very simple first-order difference encoding and an decent Huffman coder.<p>However, lets assume there is massive cross-correlation between the 1024 channels. For example, in the extreme they are all the same, meaning if we encode 1 channel we get the other 1023. That means a lower limit of 4.5/1024 = about 0.0045 bits per sample, or a compression rate of 2275. Viola!<p>If data patterns exist and can be found, then more complicated coding algorithms could achieve better compression, or tolerate more variations (i.e. less cross-correlation) between channels.<p>We may never know unless Neuralink releases a full data set, i.e. 1024 channels at 20KHz and 10 bits for 1 hour. That's a lot of data, but if they want serious analysis they should release serious data.<p>Finally, enforcing the requirement for lossless compression has no apparent reason. The end result -- correct data to control the cursor and so on -- is the key. Neuralink should allow challengers to submit DATA to a test engine that compares cursor output for noiseless data to results for the submitted data, and reports the match score, and maybe a graph or something. That sort of feedback might allow participants to create a satisfactory lossy compression scheme.
This reminds me a lot of the Hutter Prize[1]. Funnily enough, the Hutter Prize shifted my thinking 180 degrees towards intelligence ~= compression, because to truly compress information well you <i>must</i> understand its nuanced.<p>[1]<a href="http://prize.hutter1.net/" rel="nofollow">http://prize.hutter1.net/</a>
200x compression on this dataset is mathematically impossible. The noise on the amplifier and digitizer limit the max compression to 5.3x.<p>Here’s why: <a href="https://x.com/raffi_hotter/status/1795910298936705098" rel="nofollow">https://x.com/raffi_hotter/status/1795910298936705098</a>
Check out this link for background info.
<a href="https://mikaelhaji.medium.com/a-technical-deep-dive-on-elon-musks-neuralink-in-40-mins-71e1100f54d4#43cf" rel="nofollow">https://mikaelhaji.medium.com/a-technical-deep-dive-on-elon-...</a>
"aside from everything else, it seems like it's really, really late in the game to suddenly realize 'oh we need magical compression technology to make this work don't we'"<p><a href="https://x.com/JohnSmi48253239/status/1794328213923188949?t=_8K1rncHLesiy46IqaIMbA&s=19" rel="nofollow">https://x.com/JohnSmi48253239/status/1794328213923188949?t=_...</a>