TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Storing and Processing Large Amounts of Temporal-Spatial Data

2 点作者 khandelwal大约 14 年前
As part of our research group, we're collecting large amounts of location data. Our data essentially looks like (user id, lat/long co-ordinates, timestamp). There's other metadata involved too, but that's not relevant here.<p>We're collecting about 2-3 million records a week, and expect to collect about a year's worth of data in due time.<p>I'd really like some advice on techniques on storing and processing this data. We'd like to be able to answer queries similar to:<p>(1) For a given location, who was near that location (within a specified distance) over a specified period of time?<p>(2) Which locations are near each other?<p>That's the general idea. We don't need a real-time response, but what are good databases (or other data storage software)? I've come across people talking about k-d trees, does that work at this scale? What kind of hardware do I need? I'm hoping to get pointers towards general strategies. How do we store this data? Does it even make sense to store it all in a database? Which data/software/packages lend themselves well to distance/radius calculations?<p>We're most familiar with Python/Linux, would prefer to stay away from Java and prefer open source/free software. We're new to all this, pointers to books and papers would also be useful. All and any advice would be greatly useful.

2 条评论

notaddicted大约 14 年前
Both MySQL and Postgres support some GIS features, I would definitely recommend taking a look. I bookmarked this book about postGIS but I haven't read it: <a href="http://www.postgis.us/" rel="nofollow">http://www.postgis.us/</a> .<p>If the databases can handle your queries efficiently then it could save you a lot of work and you won't have to do anything nasty.<p>From a quick back-of-the-envelope calculation, you should have under 20GB of data, so if you need to perform an inefficient computation with low latency storing it all in RAM isn't out of the question.
评论 #2337230 未加载
bartonfink大约 14 年前
I had an interview with Boeing, and found out that they did work with Sybase to develop some DB extensions for just that purpose. I think they license them out, but that's not going to work well with your FOSS desires. Postgres seems to have extensions to allow temporal work - check here. <a href="http://pgfoundry.org/projects/temporal/" rel="nofollow">http://pgfoundry.org/projects/temporal/</a>