TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Best Tools for Data Analysis with Unstructured JSON Documents?

1 点作者 zschuessler超过 4 年前
Hi HN,<p>In 2021 I&#x27;m working to open source a behemoth project I&#x27;ve poured over 1,500 hours into. It relates to US Congress bill discovery and analysis (similar, but different, to govtrack).<p>My next major step is to write a data dictionary to bring organization to the undefined&#x2F;unstructured chaos. The goal is anyone can quickly start hacking on their own applications with the data, and conduct their own analyses, but without requiring a polysci degree to do that. I&#x27;d be thrilled if a highschool student could pick the data up and start hacking.<p>Here is an example schema: https:&#x2F;&#x2F;i.imgur.com&#x2F;Qsoa1aj.png<p>Currently I use a relational database and although JSON querying does work fine, it isn&#x27;t exactly easy to build statistical analyses with on the fly. Here are some questions I can answer, but not quickly:<p>1. What&#x27;s the entire list of unique bill attributes that have ever existed in the dataset? What about only for 2019?<p>2. How many times was X attribute used in 2019? What was every possible value for it?<p>3. For all bills and all actions ever recorded, what is the total number of unique <i>types</i> of actions have been recorded? (eg tabling a bill, holding a vote, passed to committee, etc)<p>4. Which bill was most &quot;popular&quot; (most referenced by other bills) in 2020?<p>I have experience with Elasticsearch, MongoDB, et al and am intrigued by Typesense. But as I don&#x27;t work with statistical analysis often, I humbly ask the community if there are tools I should be considering to answer the above questions (quickly!).<p>Cheers!

暂无评论

暂无评论