TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: SQL or NoSQL?

10 点作者 steinsgate将近 9 年前
I need to analyze some football match data using a neural net. The data is in XML, one XML file per match. The XML file is really dirty. I wanted to know if its make sense to extract the data from XML and structure it first in an SQL DB? Or is it advisable to work directly with the XML files? If anyone has experience on the matter, I would really like to hear about it.

6 条评论

rubyfan将近 9 年前
This is less SQL vs. NoSQL and more that you just need to extract a usable data set.<p>XML in the raw is not directly suitable for modeling or analysis. NoSQL solutions that query or otherwise make XML accessible are probably not worth the added trouble of their administration overhead.<p>XML extraction is simple to yield tabular data sets to feed into a model, database, data frame or otherwise.
geophile将近 9 年前
A lot depends on what kind of cleanup you need to do. If each cleanup step can be done in the context of a single XML file, then your simplest approach is probably to skip databases completely, just process one file at a time in your favorite language.<p>If you need set-oriented operations, it&#x27;s hard to imagine you can do better than use SQL, although that presumes that you have normalized the XML into SQL, which may or may not be trivial. Depends on the structure of your XML docs.<p>After the cleanup: hard to say what&#x27;s best, but it depends on the structure of the data, what you want to do with it, and how much of it there is. To me, SQL is the tool of choice nearly always, unless you have requirements for data volume or data structuring that are incompatible. On the latter point (data structuring), since you have XML, I would guess that it would not be difficult to define a SQL schema. I.e., the schemaless aspect of NoSQL systems might not be important for you.
评论 #12209525 未加载
saluki将近 9 年前
I&#x27;m working on a fantasy football(NFL) app.<p>You might need to write an importer that parses your xml, cleans it up.<p>I just setup an importer for weekly game results. I cut the data off the website that hosts our league and paste it into a form, select the season and week and then click import. It cleans up the text, sets the teams, gets the points, etc and inserts it in to MySQL.<p>I basically then display the data, run some calculations, etc.<p>This app is for a history of our league so we have our results if we change providers, etc.<p>As far as the best fit for you.<p>What is your goal, do you want to access this data run reports show historical data. If so I would extract it out to MySQL.<p>Or if you&#x27;re just analyzing the match maybe import the xml file, show results as a one time process.<p>Good luck.
评论 #12209569 未加载
systems将近 9 年前
you need to ask yourself this ..<p><pre><code> * do i want to create a database of the match data * are you more comfortable with sql than other languages * are you more comfortable with nosql * do you have technical (i.e. performance) constraints or requirements * will others need to access and analyze this data * what reporting tools do you plan to use * what reporting tools are your users used to use </code></pre> after you starting asking yourself the right questions (i.e analyzing) .. the answer should present itself to you
einhverfr将近 9 年前
Depends on what analyze data means. Relational math is <i>wonderful</i> for many kinds of data analysis so my first inclination is to assume that SQL would be preferable.<p>But on the other hand... It sounds like this is training data for a neural net? In that case maybe working with the XML directly makes more sense?<p>In the end, I would probably go with SQL to start with just because the analysis features of relational databases would provide some really nice ways to check results. That may be secondary but it is significant.
评论 #12209554 未加载
cauterized将近 9 年前
You should definitely extract the data from the XML. Whether to extract it into a SQL vs a NoSQL DB is a different choice. You&#x27;re likely to want a DB regardless, for the sake of the efficiency you get from indexing (compared to XML) even with a document store, and the flexibility you get from databases rather than custom native classes and data structures (though you may also want to build those on top of the DB layer).