TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Bistro – A light-weight column-oriented data processing engine

95 点作者 asavinov超过 7 年前

9 条评论

agibsonccc超过 7 年前
The core looks close enough to dataframes that I&#x27;d be curious to know how you compare to tablesaw: <a href="https:&#x2F;&#x2F;github.com&#x2F;jtablesaw&#x2F;tablesaw" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jtablesaw&#x2F;tablesaw</a><p>This looks neat but I&#x27;m not sure why I would care about this. There&#x27;s a ton of solutions out there in the ecosystem out there already with a columnar like interface.<p>Granted, we wrote our own as well[1] that uses the builder pattern that you then toss to an executor (our main backend is spark for this). One reason we wrote this is for persistence purposes. Being able to encode and persist a series of transforms that you can then load remotely has been very helpful for us in machine learning.<p>We&#x27;ve since migrated this project to the eclipse foundation and intend on doing a rewrite of the interface as well as integrate our baked in tensor library[2] in to certain parts of the pipeline for speed purposes and handling things like computer vision workloads.<p>In general, I always like seeing new takes on the columnar format processing approach but I&#x27;m just not seeing anything novel here. Clarification of intent would be great!<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;deeplearning4j&#x2F;DataVec" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;deeplearning4j&#x2F;DataVec</a> [2]: <a href="https:&#x2F;&#x2F;github.com&#x2F;deeplearning4j&#x2F;nd4j" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;deeplearning4j&#x2F;nd4j</a>
评论 #16166250 未加载
buremba超过 7 年前
Is it in-memory? Does it support replication or sharding? What&#x27;s the main use-case? How does it differ from ORC, Parquet or Arrow? The repository doesn&#x27;t have any information.
评论 #16163056 未加载
dgudkov超过 7 年前
Interesting idea. Columnar ETL can be quite efficient in some scenarios because frequently an ETL transformation (e.g. calculating a new column) effectively modifies an existing table, rather than creates a new one. This allows calculating only the delta, instead of re-building a new table from. This helps optimize performance and do calculations in-memory without slow disk I&#x2F;O.<p>Another advantage is that it allows performing many transformations (e.g. filtering) directly on dictionary compressed data, without decompressing it. This works well in Vertica [1] (based on C-Store DB [2]) which was our inspiration for building a light-weight ETL for business users that also uses a columnar in-memory data transformation engine [3].<p>[1] <a href="https:&#x2F;&#x2F;www.vertica.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.vertica.com&#x2F;</a><p>[2] <a href="http:&#x2F;&#x2F;db.csail.mit.edu&#x2F;projects&#x2F;cstore&#x2F;" rel="nofollow">http:&#x2F;&#x2F;db.csail.mit.edu&#x2F;projects&#x2F;cstore&#x2F;</a><p>[3] <a href="http:&#x2F;&#x2F;easymorph.com&#x2F;in-memory-engine.html" rel="nofollow">http:&#x2F;&#x2F;easymorph.com&#x2F;in-memory-engine.html</a>
krat0sprakhar超过 7 年前
Sorry for being that guy, but I just clicked into a random file in src to read the code, and found the code style (indentation etc.) to be quite weird <a href="https:&#x2F;&#x2F;github.com&#x2F;asavinov&#x2F;bistro&#x2F;blob&#x2F;master&#x2F;core&#x2F;src&#x2F;main&#x2F;java&#x2F;org&#x2F;conceptoriented&#x2F;bistro&#x2F;core&#x2F;ColumnData.java" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;asavinov&#x2F;bistro&#x2F;blob&#x2F;master&#x2F;core&#x2F;src&#x2F;main...</a>.<p>Might I suggest using <a href="https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;google-java-format" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;google-java-format</a> for formatting?
评论 #16163754 未加载
评论 #16162936 未加载
评论 #16162919 未加载
jitl超过 7 年前
An example would be great. Can you show how to do a given task with SQL, map&#x2F;reduce, and your framework?<p>Because right now I have no idea why I’d choose to learn this new stuff over using google-able tools I already know.<p>Make your value proposition <i>really</i> clear.
评论 #16162802 未加载
jnordwick超过 7 年前
Might new a cool idea, but not nearly fleshed out enough. I think a larger example instead of just individuals lines of code would be useful. Show a toy widget sales spreadsheet.<p>What is the use case? Does it support time series? How works you do a moving average or pivot table?
评论 #16163178 未加载
nickpeterson超过 7 年前
I skimmed the readme but didn&#x27;t see the answer to what I regard as a basic question. How is this different from a view? I can easily make derived columns based on functions and reference those in other views (performance issues aside).
julienfr112超过 7 年前
How do that compare to SAS software (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;SAS_(software)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;SAS_(software)</a>) ? Particularly the &quot;DATA&quot; steps.
评论 #16172495 未加载
KasianFranks超过 7 年前
This is neat. Vectorspace based AI calculations will benefit from this approach. Great work!