TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Fast columnar JSON decoding with arrow-rs

56 pointsby necubiabout 2 months ago

4 comments

jdfabout 2 months ago
It would be great if someone could implement the schema discovery algorithm from the DB research GOAT, Thomas Neumann, and add it to Apache Arrow: <a href="https:&#x2F;&#x2F;db.in.tum.de&#x2F;~durner&#x2F;papers&#x2F;json-tiles-sigmod21.pdf" rel="nofollow">https:&#x2F;&#x2F;db.in.tum.de&#x2F;~durner&#x2F;papers&#x2F;json-tiles-sigmod21.pdf</a>
vjerancrnjakabout 2 months ago
Given that schema is known, should be able to avoid general JSON parsing. Would be much faster.
评论 #43466601 未加载
at0mic22about 2 months ago
How does it compare with serde, which AFAIK uses the same approach
评论 #43466580 未加载
atombenderabout 2 months ago
The benchmark section (&quot;But is it fast?&quot;) contains a common error when trying to represent ratios as percentages.<p>For the &quot;Tweets&quot; case, it reports a speedup of 229%. The old value is 11.73 and the new is 5.108. That is a speedup of 2.293 (i.e. the new measurement is 2.293 times faster), but that is a difference of -56%, <i>not</i> 229%, so it&#x27;s 129% faster, if you really want to use a comparative percentage.<p>Because using percentages to express ratio of change can be confusing or misleading, I always recommend using speedup instead, which is a simple ratio. A speedup of 2 is twice as fast. A speedup of 1 is the same. 0.5 is half as fast.<p>Formulas:<p><pre><code> speedup(old, new) = old &#x2F; new relativePercent(old, new) = ((new &#x2F; old) - 1) * 100 differenceInPercent(old, new) = (new - old) &#x2F; old * 100</code></pre>
评论 #43466686 未加载