TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Random access string compression with FSST and Rust

87 点作者 aduffy8 个月前

6 条评论

judofyr8 个月前
I implemented this in Zig earlier: <a href="https:&#x2F;&#x2F;github.com&#x2F;judofyr&#x2F;minz">https:&#x2F;&#x2F;github.com&#x2F;judofyr&#x2F;minz</a><p>It’s a quite neat algorithm. I saw compression ratios in the 2-3x range. However, I remember that the algorithm for finding the dictionary was a bit unclear. I wasn’t convinced that what was explained in the paper found the “optimal” dictionary. With some slight tweaks I got widely different results. I wonder if this implementation improves on this.
评论 #41530227 未加载
Epicism8 个月前
Super interesting! I’m curious how this differs from InfluxDB’s German strings implementation <a href="https:&#x2F;&#x2F;www.influxdata.com&#x2F;blog&#x2F;faster-queries-with-stringview-part-one-influxdb&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.influxdata.com&#x2F;blog&#x2F;faster-queries-with-stringvi...</a>
评论 #41527266 未加载
jcgrillo8 个月前
I really like the look of vortex[1]! One of my industry pet peeves is all the useless utf-8 server log bytes. I&#x27;d like to log data in a sane, schemaful, binary format and this looks like it could be a good way to do that. Bonus points if we can wire this up as a physical layer for e.g. datafusion[2] so I can analyze my logs with the dataframe abstraction.<p>EDIT: Question about FSST--lets say I build a strings table like:<p><pre><code> struct Strings { compressor: fsst::Compressor, compressed: Vec&lt;Vec&lt;u8&gt;&gt; } </code></pre> Is there some optimal length for compressed given the 255 symbols limit?<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;spiraldb&#x2F;vortex">https:&#x2F;&#x2F;github.com&#x2F;spiraldb&#x2F;vortex</a> [2] <a href="https:&#x2F;&#x2F;github.com&#x2F;apache&#x2F;datafusion">https:&#x2F;&#x2F;github.com&#x2F;apache&#x2F;datafusion</a>
评论 #41536898 未加载
aidenn08 个月前
What is the meaning of &quot;Arrow&quot; in this context?
评论 #41527762 未加载
chgo18 个月前
A question regarding the second generation in the example: Why is the symbol &quot;um&quot; (0) only counted once?
评论 #41530330 未加载
scotty798 个月前
So this lets you compress a collection of strings and cheaply decompress any of them individually?
评论 #41532561 未加载