TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Blosc – A high performance compressor optimized for binary data

175 pointsby tonteldoosalmost 5 years ago

12 comments

buybackoffalmost 5 years ago
It&#x27;s basically byte or bit shuffling filter (very fast SIMD optimized) in front of several modern compressors (lz4, zstd, their own) with self describing header. So if you have an array of 100 8-byte values, the result of shuffling is 100 1st bytes, followed by 100 of 2nd bytes and so on.<p>It shines when values are of fixed size with lots of similar bits, e.g. positive integers of the same magnitude. It&#x27;s not so good for doubles, where bits change a lot. Also, if stroring diffs it helps to take a diff from initial value in a chunk, not previous value, so that deltas change sign less often (and most bits flipped).<p>From own usage case, for the same data, C# decimal (16 bytes struct) is compressed much better than doubles (final absolute blob size), while decimal is taking 2x more memory uncompressed.<p>If data items have little similar bits&#x2F;bytes then it&#x27;s underlying compressor that matters.
Xceleratealmost 5 years ago
Back when I did HPC work, I used Blosc to compress information about atoms for molecular dynamics simulations before transferring this data between the Infiniband interconnects. Despite the high speed of the interconnects, it was actually faster to compress, transmit, and decompress using Blosc than to transmit only the raw data.
评论 #23485954 未加载
评论 #23487927 未加载
lrm242almost 5 years ago
Blosc is an outstanding project. I have used it with great success in finance and general data science in production with very large total datasets (one custom binary format and one leveraging protobufs).<p>It really shines first and foremost as a meta compressor, giving the developer a clean block based API. Once integrated (which really is quite easy) you can experiment easy with different compressors and preconditioners to see what works best with your dataset. These things can be changed at runtime and give you great flexibility.<p>Francesc has been advancing blosc consistently with a steady vision for years and years. It is one of the most underrated tools around IMO.
devitalmost 5 years ago
Apparently they have several benchmarks where they claim that decompression is faster than memcpy (!).<p>However, this is only the case because on several Intel x86_64 benchmarks they report memcpy performance between 5-10 GB&#x2F;s, while even a basic DDR3 dual channel arch has 20 GB&#x2F;s memory bandwidth, while a modern quad channel DDR4 can have 76.8 GB&#x2F;s bandwidth, and of course there is no reason for memcpy to be substantially slower than memory bandwidth assuming it&#x27;s properly implemented (AVX can separately read two and write one 256-bit per cycle = 128 GB&#x2F;s memcpy at 4GHz).<p>Am I missing something or is this another case of &quot;implausible claims = they screwed the benchmark = they are incompetent&#x2F;malicious&quot;?
评论 #23491392 未加载
xiaodaialmost 5 years ago
It&#x27;s very good! I have used Blosc in developing JDF.jl a serialization format for dataframes.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;xiaodaigh&#x2F;JDF.jl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;xiaodaigh&#x2F;JDF.jl</a>
评论 #23488380 未加载
gigatexalalmost 5 years ago
Would be cool to see this in ZFS to make compressing binaries even more efficient
nisaalmost 5 years ago
The used shuffle techniques before compresson might be useful for squashfs? We play around with a mesh network (freifunk.net) and there are ton&#x27;s of cheap 4mb flash devices that need every kb of storage :)
axegon_almost 5 years ago
Blosc is an excellent choice if speed is what you are after. Give or take 5 years ago I had to use a compression to transport a lot of data over zmq and blosc ran in circles over all other compressions.
评论 #23487034 未加载
requin246almost 5 years ago
Can someone with Blosc 2 experience tell me what are the proper conditions to use superchunks or frames? When does it become advantageous to use one over the other?<p>This is a really interesting library.
js8almost 5 years ago
This would be an excellent candidate to put on an FPGA directly next to the CPU. (Assuming such thing would exist and be open enough to be usable by general public.)
waatelsalmost 5 years ago
This look amazing. The application looks so diverse ! Can someone know if it can be applied on msgpack ?
评论 #23487549 未加载
any1almost 5 years ago
Can blosc be used to compress&#x2F;decompress regular zlib streams?