The "C is Efficient" Language Fallacy (2006)

173 点作者 m_for_monkey超过 13 年前

22 条评论

nickolai超过 13 年前

Umm, yes C/C++ will not parallelize your code for you, whereas Language XYZ will. Im not sure how this is an argument against C/C++ being effective. C/C++ will not parallelize your code by design. It's was never meant to. I'd never blame my stick-shift car for not changing gears by itself - thats the first reason I didnt buy an automatic in the first place!If your matrix manupuilation code performance becomes an issue, you, the C/C++ coder will have to parallelize it and take responsibility for whatever assumptions have to be made.Im sure that the limit of what one considers as acceptable code alteration by the compiler/interpreter depends on the person. Personally I'd be very wary of a system trying to guess-parallelize my code - especially if I can do it myself whenever I see fit by adding a dozen boilerplate code lines. I'm sure most academics needing a numerical calculus system would be glad to have the system abstract away every possible optimization, so that they can stay as much as possible in the 'abstract math space'.I am also kind of wary of the "look, I wrote the program in several languages and here is the perf comparison". Our skills with each language vary. What I liked was (i cant seem to find the link) a teacher who asked his class to write a program doing some text manipulation/indexing (iirc) in whatever language they wanted. The fastest code was in C, yet the worst C implementation was significantly slower than an average java code. To sum all this up - the speed depends on the skill of the person in the particular language, much more than on the language itself.

评论 #3442372 未加载

评论 #3442399 未加载

评论 #3443023 未加载

评论 #3442593 未加载

MichaelSalib超过 13 年前

Putting aside his general point C efficiency, I'm curious about his specific claim that Fortran compilers outperform C specifically because aliasing conceals optimization opportunities. John Reghr points [1] to a really interesting paper from 2004 that used a special analysis tool to mark every single pointer as restrict as it safely could in the SPEC benchmark. The result was a 1% performance improvement.That suggests that at least circa 2004, if aliasing were a serious problem for C compilers, restrict annotations were not the solution, which calls Chu-Carroll's claim into question. But there might be other explanations. Any thoughts?[1] <a href="http://blog.regehr.org/archives/537" rel="nofollow">http://blog.regehr.org/archives/537</a>ETA: Just to clarify what the 2004 paper involved: the researcher took the SPEC code and ran it through a dynamic analysis tool that identified every single use of non-restricted pointers that did not alias any other in-scope pointer at the time. He then took all of those pointers and marked them as restricted in the source text. The result was a program with every possible pointer marked as restricted that could be so marked. That's way more than a human annotater will ever do. And all that got him a 1% performance improvement.

评论 #3442721 未加载

评论 #3442920 未加载

unwind超过 13 年前

This is old (2006), and seems to base its argument around the problems with unrestricted pointers in C, that cause aliasing.As of C99, of course, C has the "restrict" keyword which allows pointers to explicitly be declared to not alias, thus enabling all these optimizations in C, too.

评论 #3442282 未加载

评论 #3442571 未加载

评论 #3442601 未加载

评论 #3442298 未加载

评论 #3442584 未加载

评论 #3442437 未加载

chalst超过 13 年前

C and C++ suck rocks as languages for numerical computing. They are not the fastest, not by a longshot. In fact, the fundamental design of them makes it pretty much impossible to make really good, efficient code in C/C++.I dare say this is more or less true of C, but following big improvements in the quality of C++ compilers (changes that happened well before 2006), C++ has proven itself as a language for high-performance scientific computing. Todd Veldhuizen provided a survey back in 1997 of the changing case in favour of C++ to accompany his Blitz C++ library: <a href="http://www.autistici.org/sens/inf/scientificcomputingcfortran.pdf" rel="nofollow">http://www.autistici.org/sens/inf/scientificcomputingcfortra...</a>Fortran still has considerable advantages, but it's been a long time since Fortran programmers could regard C++ as offering unserious performance.I would also hesitate to say that C++ is lower level than Fortran. With a suitable coding style, C++ is a quite high-level language. In fact, it is precisely the abstractions that C++ offered (templates) that allow the optimisations to take place that have delivered these improvements in compiler performance. Was the author not aware that templates can be used in this way? The discussion of alias detection suggests so.The high-level point is right, namely that abstractions make for safer languages and give compilers freedom to make optimisations that apparently more efficient, less safe languages cannot, and so deliver better performance. But it would have been a better article if it had not mentioned C++.Another conclusion to draw is that benchmarks produced by people who are out to make a point are worthless.

评论 #3442754 未加载

attractivechaos超过 13 年前

The longest common substring problem (LCS) can be solved in O(n^2) time using dynamic programming, not O(n^3) as is stated by the author of that post. If the author is unable to get the basic fact right, I can hardly trust his benchmark. Also, I question the author's skill in C/C++: in my experiences, C is consistently faster than Java for such tasks and C++ is nearly as fast as C as long as we use it the right way. If we look at the computer benchmark games, OCaml never beats C in terms of speed. I doubt the conclusion was much different in 2006. The author should released the source code; otherwise the benchmark tells us nothing but his incapability in programming.EDIT: in his comments to another commenter, the author was saying this: "[The OCaml compiler] could do some dramatic code rewriting that made it possible to merge loops, and hoist some local constants out of the restructured merged loop." A good C programmer should be able to do all the above simply by instinct. The author was not good enough. I buy the argument that being really good at C/C++ is more difficult than at other languages, but this is not the same thing as arguing C is inefficient.

评论 #3442969 未加载

评论 #3443190 未加载

zvrba超过 13 年前

C and C++ are efficient for general-purpose programming, if you know how to use them. C is here to stay because it is lingua franca of the computing world: OS APIs are defined in terms of C functions, and I know of no libraries in wide-spread use that do not offer a C or C++ interface.People otherwise rightfully challenge his conclusions.There's a funny comment there about matlab: "MATLAB struck me as being the wrong tool for every problem."

评论 #3442548 未加载

评论 #3442774 未加载

haberman超过 13 年前

As someone who doesn't know Fortran, how does Fortran solve the aliasing problem? Even if pointers and arrays are different, how can you ensure two arrays don't alias each other? The only way I can think of to do this is to always copy arrays when they are passed to functions, but this seems expensive. Otherwise I don't see how you can avoid this pseudocode:<pre><code> void f(array1, array2) { /* somehow guaranteed not to alias? */ } void g() { array my_array[50]; f(my_array, my_array); }</code></pre>

评论 #3444425 未加载

评论 #3444884 未加载

jpdoctor超过 13 年前

> Modern architectures have reached the point where people can't code effectively in assembler anymoreSomeone should inform the guys over at ffmpeg that the jig is up.

评论 #3446073 未加载

评论 #3446270 未加载

Locke1689超过 13 年前

Arrays and pointers are most certainly not the same thing in C. Check yourself here:<pre><code> char x[100]; char* y = malloc(100*sizeof(char)); printf("Array: %ld\nPointer: %ld\n", sizeof(x), sizeof(y));</code></pre>

emillon超过 13 年前

> In C and C++, there's no such thing as an arrayYes there is. `int a[10]` allocates 10 consecutive `int`s on the stack, and it's the only language construct to express that (with `alloca` but it's a builtin function).

评论 #3442310 未加载

cks超过 13 年前

I was actually under the impression that Fortran was used simply because the experts (in this case in fluid dynamics) was familiar with Fortran. It was the language they learned and used while back at the university. At least this is the impression I got from working in the field.I never heard of anyone suggesting we should use Fortfran for performance reasons, instead there was an ongoing movement to evolve the code, moving it to use the OpenFOAM solver that's written in C++.

评论 #3442351 未加载

jheriko超过 13 年前

Okay, so there are reasons why C is /difficult/ to make very efficient for numerical computations. Aliasing is not one of them since C99 for a sufficiently knowledgable programmer thanks to the restrict keyword.The library is the real problem.code.google.com/p/fridgescript - faster than C in some cases, only because it doesn't use the math library, but uses hardware without indirect calls and without caring for obscure edge cases.

stephencanon超过 13 年前

I write high-performance numerical software for a living. There are a lot of baseless claims in this post. You can write high performance software in C, C++, Fortran, Assembly, or a whole host of other languages. There are syntactic reasons to prefer one or another, but you should not choose among them for performance reasons.I choose to write in C and Assembly, for example, and much of the code I write is provably as fast as possible on the targeted architecture. It is literally impossible that it would go faster if I wrote it in Fortran instead. All of these languages are just tools, and if you know your tool, you can do great things with it. The specifics of which tool you choose are often unimportant.There are some syntactic niceties in fortran which make it more comfortable for people who don't want to think about certain low-level details. However, you cannot write software that runs as fast as possible without considering those details, so a programmer with that goal is forced to think about them no matter what tool he or she chooses.Fortran does have a (slightly) more relaxed numerics model than standard C, which allows a compiler to make some optimizations that a C or C++ program would need to explicitly license. However, these optimizations are disallowed in standard C and C++ because they are unsafe. The fact that Fortran enables them does not make Fortran a better language for numerical computation (from my perspective as low-level library writer, they make it worse). Performance without correctness is absolutely meaningless.Write software in the language that is comfortable for you. Use libraries written by experts for performance critical operations. Use a profiler to identify operations that are hotspots in your code. Don't complain about your (or someone else's) tools.

yummyfajitas超过 13 年前

The author's point is well taken, but his example leaves a lot to be desired:If you look at that loop, it can be parallelized or vectorized without any problem if and only if the array pointed to by x and the array pointed to by y are completely distinct with no overlap. But there's no way to write code in C or C++ that guarantees that.<pre><code> double* doMath(double** y) { double** x = allocateNew2DArray(20000, 20000); for (int i=0; i < 20000) { for (int j=0; j < 20000) { x[i][j] = y[i-2][j+1] * y[i+1][j-2]; } } return x; } </code></pre> All you need to do for this example is replace the fortran coding style doMath(double* x , double* y) with the c coding style doMath(double* y).I don't think any C compilers actually do parallelize code like this, or at least they don't do much beyond using SIMD. But in principle they could.

评论 #3443863 未加载

评论 #3442936 未加载

antirez超过 13 年前

C does not offer parallelization automatically, but you can pick a model, and a library, that is a good fit for turning your app into a parallel one. In this regard the following slides are interesting: <a href="http://swtch.com/~rsc/talks/threads07/" rel="nofollow">http://swtch.com/~rsc/talks/threads07/</a> The model proposed here may not be the right one for your application, so you can design your own (including multiple processes exchanging messages for instance, or the usual threading with locks, and so forth, it's up to you).This requires efforts but to be honest, there is currently no language that is a good fit for system programming and that is able to parallelize your code magically and automatically. Such a language would give a strong competitive advantage to programmers using it, as C did in the past over other languages, so would become mainstream soon or later, or its ideas would incarnate in some other "better C" language. If this is not happening IMHO there is something wrong in languages that currently are able to do more than C in this regard.In programming ideas tend to take years to be accepted, but there is a very clear trend over decades: something that is really better (as in code that is faster, or simpler to write (very useful abstraction XYZ), or more easy to debug, or with higher quality libraries, or even much simpler to deploy (PHP I'm looking at you)) eventually becomes mainstream.

gsg超过 13 年前

No mention of representation issues? Memory is a big optimisation target these days, and a weak point of languages like Java and OCaml.

dwc超过 13 年前

The missing meta lessons: don't fall for overly simple characterizations, and don't be a language bigot."C is Efficient" is largely true, but it's certainly not always true, and there are problem domains where it's seldom true. If you don't know that this is the case with any language then you're not qualified to be picking an implementation language anyway.

growingconcern超过 13 年前

What a dolt. It's possible to tell the compiler that two pointers aren't aliases: the "restrict" keyword. Unless I'm mistaken that is really his sole argument against "pointer-based languages".

duaneb超过 13 年前

Well argued... until I realized that his argument hinges on C/++ compilers not being able to guarantee non-aliasing. That's why they introduced the `restrict` keyword in C99....

akg超过 13 年前

Every language has it's pitfalls. There isn't one single greatest language for everything. Within an application domain one needs to consider the tradeoff between machine-time and human development-time and determine what the best tool for the job is.A 5minute run-time in Python might be acceptable if it takes you 1 hour to write it and are only going to use it once; whereas you may not even know how to program in OCaml even though it offers the best run-time.

jensnockert超过 13 年前

While a lot of Scientific computing is done in FORTRAN, much of the lower-level `plumbing' is written in assembly, C or C++.C isn't an efficient language, C is just thin layer on top of assembly. You can write shit code in C, I know that from experience, but you can also write really fast code in C.

评论 #3442314 未加载

gizzlon超过 13 年前

For me, the blinking and moving ads completely invalidate this site..

评论 #3442283 未加载

评论 #3443411 未加载