Given billions of rows of data across a few tables, how do you best make sense of it?<p>My current method involves thinking up interesting questions and writing database queries. I then plug the resulting data into gnuplot to examine it.<p>Is there generally a better way? I am kinda hoping a Mathematica/Matlab type shell or similar for databases or other data sources exists. Just type a query and view a graph. Even better, type queries, output graphs into a web page.<p>Or is the method to hire a data scientist to build specialised reports?<p>The data format is agnostic, interested in how this works across all ecosystems.
> Given billions of rows of data across a few tables, how do you best make sense of it?<p>Your inquiry won't go anywhere until you describe the problem you're trying to solve. Be specific, if only for a single example problem.<p>I say this because there's no generic solution to accessing a large database -- the solution depends on the goal.
Using a language that supports an interactive development library might speed up the process for you.<p>I use Clojure, and like Incanter for this kind of work. I also use Datomic as my data store, when I can, which makes it quite easy to perform ad-hoc queries.<p>Of course, the fact that your data is too large to effectively fit in memory means that, whatever you're graphing, you're going to have to aggregate it a bit first before you can visualize it. That's really the hardest part of what you asking, and how you do that efficiently depends entirely on what your query is and what kind of data store you're using.<p>I'm not aware of any off-the-shelf software that does what you're talking about, unless it fits into an OLAP-type schema (<a href="http://en.wikipedia.org/wiki/OLAP_cube" rel="nofollow">http://en.wikipedia.org/wiki/OLAP_cube</a>) for which there are several products available.
When you mentions billion row datasets MapReduce and Apache Hadoop comes to mind, but that requires that you are capable to do some computer programming.<p>There may also be a lot of existing solution to present/summaries/graph you data, depending on what it contains and witch program created it. Can you give us some more insight into what kind of data you have?
SQL Server and Excel PivotTables uses Vertipaq. The main idea is data along columns tend to not change very much. Therefore, one is able to compress data in memory in columns, achieving a very high degree of compression.<p>Perhaps you can roll something like this as well.
There are GUI analysis tools that produce graphs directly from databases, eg Tableau <a href="http://www.tableausoftware.com/solutions/big-data-analysis" rel="nofollow">http://www.tableausoftware.com/solutions/big-data-analysis</a>