TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Building an open data pipeline in 2024

103 点作者 dangoldin大约 1 年前

5 条评论

RadiozRadioz大约 1 年前
&gt; And if you’re dealing with truly massive datasets you can take advantage of GPUs for your data jobs.<p>I don&#x27;t think scale is the key deciding factor for whether GPUs are applicable for a given dataset.<p>I don&#x27;t think this is a particularly insightful article. Read the first paragraph of the &quot;Cost&quot; section.
评论 #40175702 未加载
评论 #40175119 未加载
评论 #40175404 未加载
评论 #40175703 未加载
amadio大约 1 年前
I take issue with this part of the article:<p>&gt; In general, managed tools will give you stronger governance and access controls compared to open source solutions. For businesses dealing with sensitive data that requires a robust security model, commercial solutions may be worth investing in, as they can provide an added layer of reassurance and a stronger audit trail.<p>There are definitely open source solutions capable of managing vast amounts of data securely. The storage group at CERN develops EOS (a distributed filesystem based on the XRootD framework), and CERNBox, which puts a nice web interface on top. See <a href="https:&#x2F;&#x2F;github.com&#x2F;xrootd&#x2F;xrootd">https:&#x2F;&#x2F;github.com&#x2F;xrootd&#x2F;xrootd</a> and <a href="https:&#x2F;&#x2F;github.com&#x2F;cern-eos&#x2F;eos">https:&#x2F;&#x2F;github.com&#x2F;cern-eos&#x2F;eos</a> for more information. See also <a href="https:&#x2F;&#x2F;techweekstorage.web.cern.ch" rel="nofollow">https:&#x2F;&#x2F;techweekstorage.web.cern.ch</a>, a recent event we had along with CS3 at CERN.
评论 #40178249 未加载
victor106大约 1 年前
&gt; Cloudflare R2 (better than AWS S3). the article links to [1]<p>Is R2 really better than S3?<p><a href="https:&#x2F;&#x2F;dansdatathoughts.substack.com&#x2F;p&#x2F;from-s3-to-r2-an-economic-opportunity" rel="nofollow">https:&#x2F;&#x2F;dansdatathoughts.substack.com&#x2F;p&#x2F;from-s3-to-r2-an-eco...</a>
评论 #40185557 未加载
评论 #40182284 未加载
esafak大约 1 年前
Can someone explain this &quot;semantic layer&quot; business (cube.dev)? Is it just a signal registry that helps you keep track of and query your ETL pipeline outputs?
评论 #40185537 未加载
Phlogi大约 1 年前
Why do I need sqlmesh if i use dbt&#x2F;snowflake?
评论 #40185560 未加载