TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Faster PostgresSQL to BigQuery Transfers

118 点作者 fhk超过 2 年前

9 条评论

vidarh超过 2 年前
Back when I was working with shapefiles, it was the type of things that tended to be far more convenient to process in-process using something like GDAL [1] (which can operate directly on an in memory copy, gzip files, sqlite databases and far more) and query it with GDAL&#x27;s SQL support, especially when build with Spatialite [2] rather than loading it into a separate database. It&#x27;d have been interesting if the author had talked about what&#x27;s stopping him from that approach given he&#x27;s clearly aware of GDAL and given that 130M records and a few tens of GB isn&#x27;t a particularly big GIS dataset.<p>[1] <a href="https:&#x2F;&#x2F;gdal.org" rel="nofollow">https:&#x2F;&#x2F;gdal.org</a><p>[2] <a href="https:&#x2F;&#x2F;www.gaia-gis.it&#x2F;fossil&#x2F;libspatialite&#x2F;index" rel="nofollow">https:&#x2F;&#x2F;www.gaia-gis.it&#x2F;fossil&#x2F;libspatialite&#x2F;index</a>
评论 #34323718 未加载
ctc24超过 2 年前
Very cool to see a walkthrough with actual benchmarks. Not entirely surprised that Parquet shines here. Another big advantage of Parquet over CSV is that you don&#x27;t have to worry about data integrity. Perhaps less relevant for GIS data, but not having to think about things like string escaping is rather nice.<p><i>&quot;It would be great to see data vendors deliver data straight into the Cloud Databases of their customers. It would save a lot of client time that&#x27;s spent converting and uploading files.&quot;</i><p>Hear hear! Shameless plug: this is exactly what we enable at prequel.co. If there are any data vendors reading this, or anyone who wants easier access to data from their vendor, we&#x27;re here to help.<p>edit: quote fmt
legedemon超过 2 年前
For someone whose interaction with spatial data is very limited, I found the article to be a treasure trove of information.<p>Also, thanks for sharing S2! It&#x27;ll be nice to look at.
pantsforbirds超过 2 年前
I really love working with parquet and the general arrow ecosystem. The performance and cost ratios you can get out of it are really insane. AWS S3 + parquet + athena is one of the best and cheapest databases I&#x27;ve ever used.
评论 #34328695 未加载
fhk超过 2 年前
Great to share the combination of tech here and very interested to see how others are ingesting spatial data at scale
api超过 2 年前
Anything to go the other way? I’d like to use BQ to warehouse and be able to examine but PG to do heavy analytics due to the cost once you really start doing many repeated queries.<p>I guess I could just dump directly to CSVs and download but BQ is a nice convenient bottomless data bucket.
评论 #34324865 未加载
评论 #34323968 未加载
boundlessdreamz超过 2 年前
Does anyone know what tools can be used to stream the result of a mongodb query into bigquery?
评论 #34324408 未加载
yolo3000超过 2 年前
Nice to read this, I had a similar type of assignment 15 years ago, visualize the rollout of the fiber optics network across the city. But we had a lot less data to deal with.
michaericalribo超过 2 年前
Is there a reason not to used federated queries to hit postgres directly?
评论 #34329558 未加载