TechEcho

4 comments

PaulHouleabout 3 years ago

That's a log-log chart so deviations from the line are larger than they appear.Eyeballing it, popular "functional" languages like Haskell, Scala and Clojure fall low relative to the line and compared to Java, Python, etc. It looks like around a factor of 2 or 3.I'd be inclined to chalk it up to systematic errors, for instance, how do people count "bugs", if people in various communities decide it is worth fixing bugs (some don't!), etc.

评论 #31274222 未加载

mc4ndr3about 3 years ago

False.* Manual memory management errors constitute 40% of all security bugs (C/C++)* Null pointer errors constitute a significant cause of bugs (C/C++/Java/etc., anything but Haskell)* Incredibly verbose languages require more lines, and line count has been shown to correlate with bug count (C/C++)* Strong, statically typed languages with arity validation reject entire categories of bugs, which languages like Perl, JavaScript, and Ruby are famous for mucking up. Twenty thousand unit tests can save you from having to write one type signature. Can have variable name typo errors in Python, but not in Java, for example.* The vast majority of shell programs neglect to even enable available `set` robustness checks, and therefore contain control flow and input validation problems.* Early versions of many languages lack an unindexed `foreach` iterator, which leads to many unnecessary off by one errors. C++17 and C++20 are distinct languages with distinct bugs. Same for ECMAScript 3 and ECMAScript 5 and ECMAScript 6.* New programmers in most languages will mistake zero based indices for one based indices. Old programmers will mistake one based indices for zero based indices (Lua, LISP).* Languages with immutable by default data structures (Clojure) rule out entire categories of bugs, that other languages encourage.* Shell and GUI languages are uncommonly unit or integration tested. Any script written in sh or Cucumber etc. etc. is very likely to contain bugs compared to more library oriented languages. The same applies to other commonly un-tested code (in the fuzzer complete sense): Client side JavaScript snippets, Emacs and Vim configuration, SQL queries, text editor plugins, and other spare tidbits of code.* Programming languages with very small userbases are unlikely to mature enough to manage bugs, such as the custom language used for Fallout 3 crafting scripts.* Fewer off by one errors in JMP instructions when the language emphasizes high level control flow structures such as `if`. Shrug.* Fewer collection implementation bugs in languages that provide them in the standard library. What's that POSIX sh, you barely have arrays (`set`) and hashmaps (`env`) ? Or when a language has an array type but no mathematical set type other than a hashmap. Yeah, you're gonna have a bad time.* Lambdas. In JavaScript and Go, you will accidentally enclose parameters by value rather than by reference. In C++, you can accidentally specify by value or by reference. In FreePascal, you don't even have lambdas.* Show me a segfault in pure, non FFI, non-unsafe Rust.A low level, verbose language like C is going to contain many more bugs than a higher level language like C++, that automates away and guards against more kinds of bugs, for the same given level of application complexity.One could try to twist the natural correlation between programming language and bugs into an argument that says, "A programming language has an associated maximum line count above which at least one bug is 50% or more likely to be present." But in practice, programs never get shorter, they only gain lines and therefore, bugs. And so, starting with an appropriately expressive language for a given project will lead to more capabilities with fewer bugs. There's a reason we don't write everything in assembler.One could make an argument that each language encourages different kinds of bugs, and that it may be difficult to estimate the relative frequency of say, mutation bugs in Go versus memory leaks in Rust. Fortunately, we have security incidents as a historical body of evidence. And the evidence points to manual memory management errors as a top problem in security bugs. Less critical bugs get less attention, though any program not written in a proof assistant language (Coq et al) will have bugs.

评论 #31273562 未加载

username_my1about 3 years ago

purely inflammatory comment: maybe typless heathens (javascript people) don't care enough anyway to push bug fixes.joking aside, this is really really interesting I would have never guessed.

BeefWellingtonabout 3 years ago

I think all this chart shows is that Rust is the clear winner; it didn't even make the chart! (sarcasm)The huge grain of salt to take with this reanalysis is that this was of 423 projects total[1]. On top of that, this reanalysis isn't telling you about more modern languages that have the promise of "fewer bugs" because the original study data ends in 2014. The original paper featured 850 projects[2].The original paper indicates they don't see any real potential threats to the accuracy of their information but I think it does include a heavy bias towards newer languages (which may be dragging them up) for a few reasons:1. It completely ignores improvements in code editors, code completion, linters, and other pre-commit auditing tools. What in 1996 might have been "fixing RuntimeException because I called the wrong function name" is not going to be as common in modern times. This should tend to bias the results upwards as you go back in time. A counter-argument here is that more people are writing software than ever before, which is diluting the understanding required to properly write code, which could be offsetting these improvements.2. It includes the languages' times from before their popularity, e.g.: Javascript "starts" in 1999. While this is no doubt true it could hardly be seen as a true "programming" language comparable to the others when it lived only within the browser sandbox.3. Tying in with the above, they deal with multiple languages in use in a single repository by examining each repository's commit history for the presence of a minimum cutoff of commits to that language. It's not clear to me if that connects to #2 in terms of how they process the "start date" of the languages.4. They did not rely solely upon GitHub reported bugs. This could be good because it will catch things that are not logged as official bugs. However, this has a caveat that ongoing dev work can be caught up as a bug. Consider a situation where a developer stubs out several functions and has them return null to intentionally cause a crash/exception/etc when called. In this analysis this will get caught as a bug and inflate the numbers if I then write a commit adding in that missing function and I include the language "Writing doAction() so we no longer crash at that step". I don't know how often this happens but if you aren't squashing your commits (which many teams weren't in 2014) it's highly possible.The general point of the statement made -- that languages aren't responsible for bugs -- is one I happen to believe is generally true but I don't think these kinds of studies should be treated as gospel.[1]: (PDF) <a href="http://janvitek.org/pubs/toplas19.pdf" rel="nofollow">http://janvitek.org/pubs/toplas19.pdf</a>[2]: (PDF) <a href="https://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf" rel="nofollow">https://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf</a>

4 comments

PaulHouleabout 3 years ago

评论 #31274222 未加载

mc4ndr3about 3 years ago

评论 #31273562 未加载

username_my1about 3 years ago

purely inflammatory comment: maybe typless heathens (javascript people) don't care enough anyway to push bug fixes.joking aside, this is really really interesting I would have never guessed.

BeefWellingtonabout 3 years ago

There is no correlation between programming language and defect injection rate

4 comments

There is no correlation between programming language and defect injection rate

4 comments