TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Software for exploring billion row sized datasets

8 pointsby tomxover 12 years ago
Given billions of rows of data across a few tables, how do you best make sense of it?<p>My current method involves thinking up interesting questions and writing database queries. I then plug the resulting data into gnuplot to examine it.<p>Is there generally a better way? I am kinda hoping a Mathematica/Matlab type shell or similar for databases or other data sources exists. Just type a query and view a graph. Even better, type queries, output graphs into a web page.<p>Or is the method to hire a data scientist to build specialised reports?<p>The data format is agnostic, interested in how this works across all ecosystems.

5 comments

lutuspover 12 years ago
&#62; Given billions of rows of data across a few tables, how do you best make sense of it?<p>Your inquiry won't go anywhere until you describe the problem you're trying to solve. Be specific, if only for a single example problem.<p>I say this because there's no generic solution to accessing a large database -- the solution depends on the goal.
lukevover 12 years ago
Using a language that supports an interactive development library might speed up the process for you.<p>I use Clojure, and like Incanter for this kind of work. I also use Datomic as my data store, when I can, which makes it quite easy to perform ad-hoc queries.<p>Of course, the fact that your data is too large to effectively fit in memory means that, whatever you're graphing, you're going to have to aggregate it a bit first before you can visualize it. That's really the hardest part of what you asking, and how you do that efficiently depends entirely on what your query is and what kind of data store you're using.<p>I'm not aware of any off-the-shelf software that does what you're talking about, unless it fits into an OLAP-type schema (<a href="http://en.wikipedia.org/wiki/OLAP_cube" rel="nofollow">http://en.wikipedia.org/wiki/OLAP_cube</a>) for which there are several products available.
runarbover 12 years ago
When you mentions billion row datasets MapReduce and Apache Hadoop comes to mind, but that requires that you are capable to do some computer programming.<p>There may also be a lot of existing solution to present/summaries/graph you data, depending on what it contains and witch program created it. Can you give us some more insight into what kind of data you have?
teycover 12 years ago
SQL Server and Excel PivotTables uses Vertipaq. The main idea is data along columns tend to not change very much. Therefore, one is able to compress data in memory in columns, achieving a very high degree of compression.<p>Perhaps you can roll something like this as well.
jamessbover 12 years ago
There are GUI analysis tools that produce graphs directly from databases, eg Tableau <a href="http://www.tableausoftware.com/solutions/big-data-analysis" rel="nofollow">http://www.tableausoftware.com/solutions/big-data-analysis</a>