"<i>how hard would it be to assemble a million people to contribute a fraction of their compute time?</i>"<p>The BOINC project's done it, they've seen 1million+ computers. And they even have an installation barrier which is different than what you are suggesting (their software is robust and easy to install but you still have to do it).<p>One thing BOINC and BOINC projects do well is establish non-monetary incentives, whether it be competitions, fancy graphs, etc. That's something to solve, not sure enlisting just your social network (manually, with an URL) is going to be enough to cut it if you want thousands of participants (unless you are particularly "influential" I guess).<p>Or maybe this is something a legion of mechanical turkers would be interested in?
I knew I'd seen something like this before...<p><a href="http://www.pluraprocessing.com" rel="nofollow">http://www.pluraprocessing.com</a><p>Launched on HN (where else) a few months ago:<p><a href="http://news.ycombinator.com/item?id=347359" rel="nofollow">http://news.ycombinator.com/item?id=347359</a>
Here are some more "business ideas" for your enjoyment:<p>- Buy tons of those fancy interactive visual advertisements, embed the worker into them and perform mapreduce jobs in the browsers of unsuspecting users<p>- Run some of the analytic/batch processing related to a popular social network on CPUs of your customers.<p>- Have a popular site? Sell CPUs of its audience just like one sells impressions via AdSense.
<i>Google's server farm is rumored to be over six digits (and growing fast), which is an astounding number of machines, but how hard would it be to assemble a million people to contribute a fraction of their compute time?</i> - maybe Google can put an optional thingy into Chrome so that users' computers can be part of their server farm?
I realize the author wasn't proposing that something like this could be a business, but humor me:<p>Had this idea a few years back with a business model that paid publishers for cpu cycles gathered from a javascript or flash widget. We hoped to then sell this service to data-intensive industries. Decided it wasn't feasible.<p>We need to consider cpu cycles gained from this regime vs. bandwidth and cpu cycles lost from the hundreds of web, queue, data servers needed to run this model. IMO it is unlikely that this model will pay off once you consider things like network latency, and trade-offs like job size (higher job size is better) vs. job completion probability (lower job size is better).<p>Even if the potential for viability were there, it isn't clear that there is a market for something like this. Large scale computing challenges obviously exist, and a lot of people are making money with solutions like cloud computing, but these problems typically involve proprietary data sets, using proprietary or industry standard (good 'ol apps like MySQL) software. Chopping up your sensitive data and sending it en masse to the public to be processed on javascript instead of C++ doesn't exactly fit client needs.
Have fun making sure clients don't send you invalid data. You'll have to have some sort of voting system where several clients compute the same piece, and make sure they all match up. Even then, you can't be 100% sure of the results.
how is this related to map-reduce besides method names? it has a single point of failure (server), nodes do not have logic to split the job further, and on top of that painfully slow javascript engine...
Sorry, but this is basically grid computing with a slightly different client. As pointed out many times before, most interesting problems right now are IO bound. It turns out that data locality is the most important thing in processing extremely large datasets. That is the key insight in the map-reduce paper and the linchpin to the success or failure of all the distributed map-reduce frameworks that have sprung from it.<p>Most startups and small scale companies that would see the value in leveraging a system like this simply don't have the right processing profile which would make something like this worth their while. I'm sure if you graphed CPU time per byte of data you'd find a sweet spot where a service like this would speed up jobs rather than slowing them down.<p>As it happens, most companies that have a high CPU time per byte ratio are either financial firms or pharma. Most of whom not only have their own infrastructure, but would rather close up shop than see their proprietary code out in the wild for competitors to analyze.<p>And there are already plenty of clients out there for running fourier transforms on possible seti signals.