TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Run SQL on JSON files without any data loads

108 pointsby miharalmost 12 years ago

14 comments

otalmost 12 years ago
A few years ago I was toying with JSON databases without loading, and came up with an idea to <i>cache</i> the parse tree of the documents after the first query, so that subsequent queries would run much faster. I called the technique <i>semi-indexing</i> and wrote a paper [1] on that; on my synthetic tests the speedups were significant (even 10x), but I never got the chance to test it on real workloads.<p>I wonder if would be useful to integrate the semi-index code [2] in this json_fdw; thanks for sharing the code, I'll try to see how feasible this is.<p>[1] <a href="http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf" rel="nofollow">http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf</a><p>[2] <a href="https://github.com/ot/semi_index" rel="nofollow">https://github.com/ot/semi_index</a>
评论 #5801131 未加载
评论 #5801130 未加载
mortovalmost 12 years ago
I remember when it caught my eye as Postgres95 and I've always preferred it, even in the face of the huge surge for MySQL which left PostgreSQL looking like an also-ran for quite while.<p>The reality is it is one of the most amazing open source projects I know - enough to rival Linux itself for me.<p>These little snippets of PostgreSQL features and facilities which keep being posted give just a tiny hint of the truly enterprise grade features and facilities it has - I just wish I still had as much time to keep on top of it as I once did but I still keep it as my go-to database for project development. I've even dropped it in to replace full blown Oracle systems for serious cost savings.<p>The developers should be receiving far more kudos than they do.
d4ntalmost 12 years ago
I don't make a habit of spamming discussion threads but... my product QueryTree (<a href="http://querytreeapp.com" rel="nofollow">http://querytreeapp.com</a>) will load JSON files even if they have lists at the root, you can use the Append tool to bring multiple files together, and you can then use Filter, Sort, Join and Group without ever having to define a schema.<p>Admittedly, it's aimed at novice users so if you're comfortable with SQL it may feel limiting.
评论 #5801561 未加载
knowtheoryalmost 12 years ago
I am mystified as to when this tool would be useful (can anybody else think of a practical use case?).<p>You can run SQL queries on individual JSON files... but you have to have PostgreSQL installed already.<p>And you can't run it against multiple JSON files, only one, with a single JSON object at the root.
评论 #5799849 未加载
评论 #5799801 未加载
评论 #5800585 未加载
评论 #5800705 未加载
评论 #5799762 未加载
ajtayloralmost 12 years ago
With the upcoming release of Pg 9.3, I've been reading a lot more posts and articles about the FDW (foreign data wrapper) capabilities. I had not considered all the things FDW can do, but I'm getting more and more excited the further I get into it.<p>For instance, at $work we have a bunch of tables which are archives of raw click data. They are taking up quite a bit of space on disk, which we'd like to reclaim. Keeping the raw data around is good, since I want to create a summary table and you can't recreate raw data from summaries. The idea was to export the table data as CSV, and then when I had the time to for messing around with summaries I'd reload the data. With FDW I can leave the data as CSV and read it just like it was in a proper DB table. Win!<p>Or even better, again at $work we use Mysql. But I hate it because SQL strict mode isn't enabled and mysql does really stupid things with my data in "normal" mode. I can't safely turn on strict at this point because I don't have the time for testing everything. I also really like Pg and would love to switch. But again, I don't have the time to do it. What I've been thinking about is using the FDW capabilities (in particular the upcoming writable FDW in 9.3) as a way to safely and slowly migrate DB platforms. It's only an idea in my head right now, but it's an intriguing one.
languagehackeralmost 12 years ago
I kind of think the more common use case is accessing JSON data _without_ defining a schema. In fact I wouldn't mind loading the data so long as it doesn't persist for too long. If only there was a schemaless database that allowed you to set a time to live on the JSON objects you load into it... that would be humongous.
tantaloralmost 12 years ago
How is this "without any data loads"? You have to map each file to a database table and then execute SQL against that database. Isn't that exactly what "loading into a database" means?
评论 #5800830 未加载
评论 #5801887 未加载
caycepalmost 12 years ago
actually, I might use this thread to ask a noob question - I've been trying to fiddle around with some data that is relatively schemaless (i.e. I have a list of medical research subjects, and each of them have varying numbers of prior medical conditions, medication allergies, and list of current and past medicines).<p>Naturally, I thought JSON was better than relational DB's for this, but I wasn't sure how best to store this data. The relation data that I do have is stashed in .csv files that I'm transitioning over to sqlite, so I was wondering if there was a sqlite-esque DB for JSON - I was assuming something like Couch or Redis. But am I missing something, and that JSON itself IS the database?
评论 #5800901 未加载
fmariluisalmost 12 years ago
You can also use Ojota (a flat file database with ORM - <a href="https://bitbucket.org/msa_team/ojota/overview" rel="nofollow">https://bitbucket.org/msa_team/ojota/overview</a>) for something like this.
spullaraalmost 12 years ago
You can use YQL to do this — even join with other data sources.<p><a href="http://developer.yahoo.com/yql/console/" rel="nofollow">http://developer.yahoo.com/yql/console/</a>
rjurneyalmost 12 years ago
Apache Pig and Hive are other systems that can do this, minus the one file limitation.
评论 #5800975 未加载
kashnikovalmost 12 years ago
I'd rather see the code they used to crawl Amazon reviews...
评论 #5800998 未加载
binarysoloalmost 12 years ago
Commenting to save thread -- looks great!
dksidanaalmost 12 years ago
Interesting