A few years ago I was toying with JSON databases without loading, and came up with an idea to <i>cache</i> the parse tree of the documents after the first query, so that subsequent queries would run much faster. I called the technique <i>semi-indexing</i> and wrote a paper [1] on that; on my synthetic tests the speedups were significant (even 10x), but I never got the chance to test it on real workloads.<p>I wonder if would be useful to integrate the semi-index code [2] in this json_fdw; thanks for sharing the code, I'll try to see how feasible this is.<p>[1] <a href="http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf" rel="nofollow">http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf</a><p>[2] <a href="https://github.com/ot/semi_index" rel="nofollow">https://github.com/ot/semi_index</a>
I remember when it caught my eye as Postgres95 and I've always preferred it, even in the face of the huge surge for MySQL which left PostgreSQL looking like an also-ran for quite while.<p>The reality is it is one of the most amazing open source projects I know - enough to rival Linux itself for me.<p>These little snippets of PostgreSQL features and facilities which keep being posted give just a tiny hint of the truly enterprise grade features and facilities it has - I just wish I still had as much time to keep on top of it as I once did but I still keep it as my go-to database for project development. I've even dropped it in to replace full blown Oracle systems for serious cost savings.<p>The developers should be receiving far more kudos than they do.
I don't make a habit of spamming discussion threads but... my product QueryTree (<a href="http://querytreeapp.com" rel="nofollow">http://querytreeapp.com</a>) will load JSON files even if they have lists at the root, you can use the Append tool to bring multiple files together, and you can then use Filter, Sort, Join and Group without ever having to define a schema.<p>Admittedly, it's aimed at novice users so if you're comfortable with SQL it may feel limiting.
I am mystified as to when this tool would be useful (can anybody else think of a practical use case?).<p>You can run SQL queries on individual JSON files... but you have to have PostgreSQL installed already.<p>And you can't run it against multiple JSON files, only one, with a single JSON object at the root.
With the upcoming release of Pg 9.3, I've been reading a lot more posts and articles about the FDW (foreign data wrapper) capabilities. I had not considered all the things FDW can do, but I'm getting more and more excited the further I get into it.<p>For instance, at $work we have a bunch of tables which are archives of raw click data. They are taking up quite a bit of space on disk, which we'd like to reclaim. Keeping the raw data around is good, since I want to create a summary table and you can't recreate raw data from summaries. The idea was to export the table data as CSV, and then when I had the time to for messing around with summaries I'd reload the data. With FDW I can leave the data as CSV and read it just like it was in a proper DB table. Win!<p>Or even better, again at $work we use Mysql. But I hate it because SQL strict mode isn't enabled and mysql does really stupid things with my data in "normal" mode. I can't safely turn on strict at this point because I don't have the time for testing everything. I also really like Pg and would love to switch. But again, I don't have the time to do it. What I've been thinking about is using the FDW capabilities (in particular the upcoming writable FDW in 9.3) as a way to safely and slowly migrate DB platforms. It's only an idea in my head right now, but it's an intriguing one.
I kind of think the more common use case is accessing JSON data _without_ defining a schema. In fact I wouldn't mind loading the data so long as it doesn't persist for too long. If only there was a schemaless database that allowed you to set a time to live on the JSON objects you load into it... that would be humongous.
How is this "without any data loads"? You have to map each file to a database table and then execute SQL against that database. Isn't that exactly what "loading into a database" means?
actually, I might use this thread to ask a noob question - I've been trying to fiddle around with some data that is relatively schemaless (i.e. I have a list of medical research subjects, and each of them have varying numbers of prior medical conditions, medication allergies, and list of current and past medicines).<p>Naturally, I thought JSON was better than relational DB's for this, but I wasn't sure how best to store this data. The relation data that I do have is stashed in .csv files that I'm transitioning over to sqlite, so I was wondering if there was a sqlite-esque DB for JSON - I was assuming something like Couch or Redis. But am I missing something, and that JSON itself IS the database?
You can also use Ojota (a flat file database with ORM - <a href="https://bitbucket.org/msa_team/ojota/overview" rel="nofollow">https://bitbucket.org/msa_team/ojota/overview</a>) for something like this.
You can use YQL to do this — even join with other data sources.<p><a href="http://developer.yahoo.com/yql/console/" rel="nofollow">http://developer.yahoo.com/yql/console/</a>