Algorithm Development is Broken

80 点作者 platypii超过 11 年前

16 条评论

elliptic超过 11 年前

The "{area/field/business} is broken" template is broken (well, I don't know if ever worked). Algorithm development is not broken - it's specialized work and it's largely done by people with specialized backgrounds. Maybe that's not ideal, and maybe you can improve it, but saying "it's broken" is lazy and false.

评论 #7186369 未加载

gfodor超过 11 年前

What's the monetization strategy here? It would be a major step backwards for the industry if it turns out the plan here is to try to make money off of what is currently a public resource. I'd hate to see researchers having to decide if they want to publish their results or just make some cash by creating a closed source API with your service. You guys should make it clear if source code will always be available or if this is basically going to be a land-rush for people to provide an API for all the standard CS algorithms with you monetizing access to them.

评论 #7186481 未加载

评论 #7188195 未加载

frik超过 11 年前

I first thought this is a Wiki for Algorithms with sample code in different languages.Well, this site is apparently a marketplace for algorithms.If anyone is more interested in open Wikis, I found what two sites using Google:* RosettaCode.org (code in many languages): <a href="http://rosettacode.org/wiki/Sorting_algorithms/Merge_sort" rel="nofollow">http://rosettacode.org/wiki/Sorting_algorithms/Merge_sort</a>* Algorithmist.com: <a href="http://www.algorithmist.com/index.php/Main_Page" rel="nofollow">http://www.algorithmist.com/index.php/Main_Page</a>

评论 #7186418 未加载

评论 #7186631 未加载

评论 #7186391 未加载

chubot超过 11 年前

Hm, sounds interesting but vague. I don't really understand how it will work.How is data for the algorithms provided? A lot of times it is big. And messy, and proprietary. There seems to be an implicit assumption that you can just plug different algorithms to different data sets. But I can't think of a project where that has been the case.I also doubt that "algorithms" are the bottleneck in a lot of projects. I'm not an expert, but I have some personal experience to back up what people say about "data trumping algorithms" (e.g. Peter Norvig and others have written about this)I would like to hear about some more concrete examples / success stories. "Algorithms" is just too vague. I think if this becomes successful, it will be by first narrowing it to a particular domain, and then generalizing it again.

评论 #7186777 未加载

dalke超过 11 年前

I can't figure out how such a system is supposed to work. Anyone have any ideas?For example, a couple of years ago I worked on an algorithm to find the maximum common subgraph of a set of 2 or more molecular graphs. More specifically, I wanted the largest subgraph in M of N graphs (M<=N), I wanted to define the atom match criteria, and I wanted to require that rings not be broken in the subgraph. (Chemists love rings.)I did it the old-fashioned way. I read papers, I investigated similar systems, I implemented various implementation details, and I did a lot of testing.How would I find such an algorithm using this system?There are only a few people who develop this sort of algorithm. Why might I expect that this system is a better resource than traditional means?

评论 #7186469 未加载

3pt14159超过 11 年前

Funny he mentions LDA. A company I founded and sold (Algo Anywhere, which started off as a Generalized Algorithms as a Service company) was a recommendation engine as a service business built on top of LDA. The papers out there are dense, but LDA is actually pretty easy to get information on. Check out Gensim in pythonland.The thing with algorithms is that you really have to think. 200 lines of code might take two months to really grok. Especially because the people in the field make certain assumptions when they start, and sometimes even just one of those assumptions takes two weeks to understand and research.You can't just jump into PhD level research and expect to understand it right away.I don't know if algorithmia will help solve this problem, but I wish them all the best of luck. Getting actual code next to research is super important and useful.

评论 #7186577 未加载

w_t_payne超过 11 年前

Yes algorithm development is broken, but not quite in the way that the OP suggests.People tend to focus on initial algorithm development, because that is the academically prestigious intellectually stimulating bit, but the job is really about how to turn algorithms into cash; a much broader problem than the narrow slice that people typically obsess over.A huge (and frequently overlooked) part of that job is the communication and coordination role between business development and algorithm development. The volume of communication and level of detail required cannot be overstated.Another huge part of the job is actually turning a piece of research into a functioning product. Whilst a large part of the OP's proposition is intended to address this problem, (kudos) I think that his solution falls short in a big way: It omits the largest part of the solution, which is where the business learns about the algorithm and how the behaviour and performance characteristics of the algorithm interact with the business' problem domain. I.e. how does the business build sufficient expertise and knowledge of their product to be able to effectively sell it. All of these are human problems, mainly oriented around communications and learning.Having said all that, I would like to encourage the OP in his efforts. I think that it is something that is worth doing, and I really hope he is able to build a business around this idea. I think that technology can help support all of these activities, and this is actually something that I have wanted to do myself for very a long time, so Kudos to the OP for actually taking the chance, going out and doing it!

eliteraspberrie超过 11 年前

This was (partly) solved in mathematical software with TOMS and Netlib:<a href="http://toms.acm.org/" rel="nofollow">http://toms.acm.org/</a><a href="http://www.netlib.org/" rel="nofollow">http://www.netlib.org/</a>However, if you want to do something non-trivial and you want it done right, hire a computer scientist or mathematician. No amount of crowd will help if no one in the crowd has a clue what they're doing.

morganherlocker超过 11 年前

1) Will there be limits on the size of data sets/ what size data are you optimizing for? Some algos focus on hundreds of records, some on billions, and the ideal system for either are quite different (small data works great by transferring data back and forth over http, data with hundreds of thousands of records and up... not so much).2) Same question as above, but for processing times. Some algos are aimed at operating on the fly and might take ~1 second or less. Others (many that I deal with quite often), might run for days, or even weeks. What sort of processes are you optimizing for, and are long running procs on the radar?It is an interesting idea, but there is a high bar when competing against my language's package manager, and the 10s of thousands of "algorithms" already out there. Best of luck!

评论 #7188245 未加载

DennisP超过 11 年前

The API is interesting but what I really want is to understand the algorithms. Clean, clear reference code in multiple languages with good explanations would be hugely helpful. Is that part of the plan?

评论 #7186305 未加载

j2kun超过 11 年前

Reminds me of the xkcd:1. There are 14 libraries for technical algorithms. 2. We need a meta-library so that we don't need to worry about having so many libraries! 3. There are 15 libraries for technical algorithms.

joveian超过 11 年前

How is the DeepMind purchase in any way a "record sum"? The linked article does not seem to make this claim. I didn't even get to your main point and already I don't trust you.

评论 #7186275 未加载

X4超过 11 年前

Contradiction at it's finest: <a href="http://i.imgur.com/BwE2jmj.png" rel="nofollow">http://i.imgur.com/BwE2jmj.png</a> Good idea, but remove the god damn login wall.

评论 #7186810 未加载

jroesch超过 11 年前

I think it is a little disingenuous to say that algorithms get buried in academic literature and are impossible to find. By the nature of research most of these things are made public when published, often times there is no implementation or just a research quality one (which I can guarantee you is almost never useful for the "real world").For example one day I was interested in implementing HyperLogLog(a set cardinality measure that is useful in data analysis). In about 10m I had all the relevant papers on hand and after skimming them I had a pretty good sense of how to implement it.Similarly if I want to know how to implement a program dependency graph for doing program analysis I can go read a few pages of a paper and get a good description on the algorithm I would need to construct such a thing. I can believe the argument that some of these things are poorly indexed but even a bare minimum of Google searching usually results in useful algorithms. I would argue that often times many of these research algorithms have a bunch of different design decisions that are best explored in the academic literature around them, and an implementation and a few notes is not sufficient exploration.For example I was recently implementing Paxos and there were tons of little details to be extracted from the papers around that had a big impact on the actual implementation we ended up with. The 'Paxos Made Live' paper from Google had many details that were only relevant/true because of engineering decisions made by the team. If one was presented you with an implementation derived solely from that paper there are multiple incorrect assumptions you could derive.An instance of this is made apparent in Paxos Made Live. Google essentially fixes their proposer because the have used Google specific details about the number of participants and their availability. The result is that they direct all traffic to a single node, and don't spend a much time talking about leader/proposer selection (which could be useful to your needs).I also don't buy that an important part is getting the algorithms running as a service. Most so called "algorithms" are nothing more than a subroutine that is need as a piece of a greater whole. I would venture most useful "algorithms" are most likely container data structures and algorithms that operate over them. It seems that these are probably most useful to have as a library. Many libraries have already taken this approach LLVM (algorithms for code generation, albeit not always everything you want), OpenCV for computer vision routines, BLAS for linear algebra, NLOpt for non-linear optimization, and I'm sure one could come up with many more examples of democratized algorithms.

gone35超过 11 年前

Two unsolicited suggestions, if I may:1. Make open source releasing of algorithms compulsory, not optional.Think of it from your prospective clients' perspective: the upside of digging out an algorithm from academic journals and implementing it yourself is that you get to fully understand the source code (since you end up writing it yourself); you can attest its correctness; and you get to stand on the shoulders of giants by tweaking and extending it later, if you wish.Admittedly this might not be seen as that much a benefit for business users, but for many actual users of advanced algorithms in the scientific computing community, having to use proprietary algorithms with restrictions on their use has been seen as a significant step backwards, as manifest during the controversy brewed over the "Numerical Recipes" controversy [1,2,3]. Even if the benefits are more a matter of principle than mere practicality, the palpable distaste for proprietary algorithms in the scientific computing community is something you should at least keep in mind, lest you risk alienating a core user base for your product.2. Formally verify the correctness of every algorithm submitted.This is as crucial as large-scale deployment for many scientific computing users, and it is one of the banes of (and reasons why) implementing the algorithm yourself. Here your product then could really offer a compelling proposition to these users.This would also be beneficial for ensuring the reliability of your API, even if you formally waive liability to the algorithm developers (as you surely do). Else you might find yourself on the other side of securities regulators for a multi-million dollar trading glitch caused by one of your algorithms [4], or something crazy like that.[1] In fact, in the wikipedia article for Numerical Recipes, it is claimed that one of the motivations for the development of the GNU C library was precisely to come up with a free alternative to them! See: <a href="http://en.wikipedia.org/wiki/Numerical_Recipes" rel="nofollow">http://en.wikipedia.org/wiki/Numerical_Recipes</a>[2] <a href="http://aufbix.org/~bolek/download/nr.pdf" rel="nofollow">http://aufbix.org/~bolek/download/nr.pdf</a>[3] <a href="http://www.astro.umd.edu/~bjw/software/boycottnr.html" rel="nofollow">http://www.astro.umd.edu/~bjw/software/boycottnr.html</a>[4] <a href="http://www.bloomberg.com/news/2013-10-16/knight-capital-agrees-to-pay-12-million-fine-for-2012-errors.html" rel="nofollow">http://www.bloomberg.com/news/2013-10-16/knight-capital-agre...</a>

mmenafra超过 11 年前

nice article! looks like a very promising platform.cheers

评论 #7186166 未加载