TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Linux rig for data mining and machine learning

9 点作者 big_data超过 14 年前
Here's the scenario: if you were asked to build out three Linux machines that would be used together in a cluster to perform data mining and machine learning tasks, with the occasional mapreduce job thrown in, how would you spec the machines out? What distro would you use? Any must have software installs?<p>With regards to the hardware, what is your preference for manufacturer? How much would you expect to pay per machine?<p>Your thoughts and suggestions are appreciated!

4 条评论

burgerbrain超过 14 年前
I hate to say it, I'm not sure you have the necessary skills required to actually do what you're looking to do if these are the types of questions you have. A better question might be <i>"what are good resources to read to get into datamining and machine learning"</i>.
评论 #2296054 未加载
turbojerry超过 14 年前
You have a requirement, now you need a specification, until you can specify the needs accurately it is impossible to design a solution. So now you need to ask questions regarding the algorithms that will be used, what hardware can they be run on, CPUs, GPUs? What size are the datasets? What sort of speed is needed? What constraints are there, such as cost? Etc. As for hardware manufacturers, you might look at Supermicro and Appro, it really depends on your needs.
评论 #2298254 未加载
bobf超过 14 年前
Use AWS until you have a reasonable grasp of your dataset and real requirements. Then buy whatever provides the best bang for your buck, in terms of servers. That will probably mean getting 6 mid-range servers, rather than the three servers with the absolute fastest CPU/most memory available. Use either RedHat (CentOS) or Debian, and you'll almost certainly be using Hadoop. Dell servers are fine, although you can sometimes save significantly by going with something like Supermicro servers from Newegg. In terms of cost, you'll want to order the bulk of your servers' memory from a third-party, not have it included in the build.
评论 #2296857 未加载
bayareaguy超过 14 年前
A former employer of mine in the financial sector used Scalable Informatics[1] and Dell[2] servers for that sort of thing.<p>1- <a href="http://scalableinformatics.com/" rel="nofollow">http://scalableinformatics.com/</a><p>2- <a href="http://www.dell.com/us/business/p/poweredge-cloud-servers" rel="nofollow">http://www.dell.com/us/business/p/poweredge-cloud-servers</a>
评论 #2296025 未加载