TechEcho

My current project is to try and classify proteins by one of their properties and I'm currently exploring which ML algorithm has the best accuracy on the dataset. I'm trying out 4-5 classifiers from python's sklearn library and I want to run them in parallel on a cloud platform. The reason for running them in the cloud is that they take 6GB of memory each so I don't want to run them on my computer.<p>My current solution is the following script: http://pastebin.com/6X6JsXzH but I'm wondering if there's a better/easier way to do this.

Seems like a perfectly good way to solve your problem. If you wanted something that was a little more "conventional" to act as your provisioner and coordinator you could use Ansible for what you're doing here.<p>You might find that it gives you a helpful model/template for setting up a group of machines, ensuring they're configured as you need, that your artifacts get deployed to them consistently, and also that your jobs run as you intended and the results can be collected and pushed someplace.<p>I suggest Ansible, rather than other similar tools, entirely because it's so simple in scope of responsibility. It's a pretty straight forward DSL for orchestrating servers (just pushing commands to them to execute) that has no additional dependencies besides SSH and Python. Since it seems like that's all you really need for this task, there's not a lot of reason to use bigger frameworks/platforms.<p>Or... just keep using your shell scripts. :-)

Ask HN: What is the best way to run a scientific experiment on a cloud platform?

1 comment

Ask HN: What is the best way to run a scientific experiment on a cloud platform?

1 comment