The most copied StackOverflow snippet of all time is flawed (2019)

368 点作者 Decabytes超过 1 年前

32 条评论

I find it interesting that all the answers using hardcoded values / if statements (or while) are all doing up to five comparisons.It goes B, KiB, MiB, GiB, TiB, EiB and no more than that (in all the answers) so that can be solved with three if statements at most, no five.I mean: if it's greater or equal to GiB, you know it won't be B, KiB or MiB. Dichotomy search for the win!Not a single of the hardcoded solutions do it that way.Now let's go up to ZiB and YiB: still only three if statements at most, vs up to seven for the hardcoded solutions.I mention it because I'd personally definitely not go for the whole log/pow/floating-points if I had to write a solution myself (because I precisely know all too well the SNAFU potential).I'd hardcode if statements... But while doing a dichotomy search. I must be an oddball.P.S: no horse in this race, no hill to die on, and all the usual disclaimers

评论 #37679075 未加载

评论 #37676762 未加载

评论 #37676324 未加载

roryokane超过 1 年前

(2019)Past discussions:<a href="https://news.ycombinator.com/item?id=21693431">https://news.ycombinator.com/item?id=21693431</a><a href="https://news.ycombinator.com/item?id=21698619">https://news.ycombinator.com/item?id=21698619</a><a href="https://news.ycombinator.com/item?id=27533684">https://news.ycombinator.com/item?id=27533684</a>

评论 #37678802 未加载

throwaway9870超过 1 年前

I don't understand. There are 7 suffixes, can't you pick the right one with binary search? That would be 3 comparisons. Or just do it the dumb way and have 6 comparisons. How are two log() calls, one pow() call and ceil() better than just doing it the dumb way? The bug being described is a perfect example of trying to be too clever.

评论 #37675613 未加载

评论 #37676818 未加载

ComputerGuru超过 1 年前

Shameless plug: another option to format sizes in a human readable format quickly and correctly (other than copying from S/O), you can use one of our open source PrettySize libraries, available for rust [0] and .NET [1]. They also make performing type-safe logical operations on file sizes safe and easy!The snippet from S/O may be four lines but these are much more extensive, come with tests, output formatting options, conversion between sizes, and more.[0]: <a href="https://github.com/neosmart/prettysize-rs">https://github.com/neosmart/prettysize-rs</a>[1]: <a href="https://github.com/neosmart/PrettySize.net">https://github.com/neosmart/PrettySize.net</a>

评论 #37675169 未加载

oooyay超过 1 年前

Out of curiosity, is there a sizable number of developers that just copy and paste untrusted code from StackOverflow into their applications?The conjecture that people just copy from StackOverflow is obviously popular but I always thought this was just conjecture and humor until I saw someone do it. Don't get me wrong, I use StackOverflow to give me a head start on solving a problem in an area I'm not as familiar with yet, but I've never just straight copied code from there. I don't do that because rarely does the snippet do exactly and only exactly what I need. It requires me to look at the APIs and form my own solution from the explained approach. StackOverflow has pointed me in the direction of some niche APIs that are useful to me, especially in Python.

评论 #37675838 未加载

评论 #37675619 未加载

评论 #37676768 未加载

评论 #37675476 未加载

评论 #37675823 未加载

评论 #37675771 未加载

评论 #37675457 未加载

评论 #37681429 未加载

评论 #37675492 未加载

评论 #37675875 未加载

评论 #37675915 未加载

评论 #37676003 未加载

评论 #37677030 未加载

评论 #37675992 未加载

评论 #37681016 未加载

评论 #37691412 未加载

评论 #37675468 未加载

评论 #37675539 未加载

marginalia_nu超过 1 年前

I don't understand why you'd use floating point logarithms if you want log 2?Unless I'm missing something, this gives you an accurate value of floor(log2(value)) for anything positive less than 2^63 bytes, and it's much faster too:<pre><code> Long.bitCount( (Long.highestOneBit(value) << 1) - 1) - 1</code></pre>

评论 #37676827 未加载

jprete超过 1 年前

I took one look at the snippet, saw a floating-point log operation and divisions applied to integers, and mentally discarded the entire snippet as too clever by half and inherently bug-prone.

评论 #37676793 未加载

dleeftink超过 1 年前

Knowledge cascades all the way down; it goes to show how difficult it is to 'holster' even the smallest piece of knowledge once its drawn.I wonder with the rate Stack Exchange is losing active contributors, what it would take for 'fastest gun' answers to be corrected that are later found to be off mark, and what it would mean for our collective knowledge once these 'slightly off' answers are further cemented in our annals of search and increasingly, LLM history.

dirtyv超过 1 年前

This reminds me of when I was in basic training. The drill sgts would give us new recruits a task that none of us knew how to do, purposefully without guidance, and then leave. One guy would try and start doing it, always the incorrect way, and everyone else would just copy that person.

评论 #37676767 未加载

评论 #37676783 未加载

koromak超过 1 年前

In a way, I don't even consider floating point errors to be "flaws" with an algorithm like this. If the code defines a logical, mathematically correct solution, then its "right". Solving floating point errors is a step above this, and only done in certain circumstances where it actually matters.You can imagine some perfect future programming language where floating point errors don't exist, and don't have to be accounted for. Thats the language I'm targeting with 99% of my algorithms.

bloak超过 1 年前

This reminds me of a weirdness with some sat navs: the distance to your exit/destination is displayed as: 12 ... 11 ... 10 ... 10.0 ... 9.9 ... 9.8 ... with the value 10.0 shown only while the distance is between 9.95 and 10. It's not really a bug but it's strange seeing the display update from 10 to 10.0 as you pass the imaginary ten-mile milestone so perhaps it's a distraction worth avoiding.

评论 #37675438 未加载

envsubst超过 1 年前

Almost every top stack overflow answer is wrong. The correct one is usually at rank 3. The system promotes answers which the public believes to be correct (easy to read, resembles material they are familiar with, follows fads, etc).Pay attention to comments and compare a few answers.

评论 #37676546 未加载

评论 #37675661 未加载

评论 #37675934 未加载

评论 #37677689 未加载

评论 #37676178 未加载

评论 #37676020 未加载

评论 #37678672 未加载

评论 #37675997 未加载

评论 #37677295 未加载

评论 #37677515 未加载

crabbone超过 1 年前

Long time ago, when ActionScript was a thing, there was this one snippet in ActionScript documentation that illustrated how to deal with events dispatching, handling etc. In order to illustrate the concept the official documentation provided a code snippet that created a dummy object, attached handlers to it, and in those handlers defined some way of processing... I think it was XML loading and parsing, well, something very common.The example implied that this object would be an instance of a class interested in handling events, but didn't want to blow up the size of this example with not so relevant bits of code.There was a time when I very actively participated in various forums related to ActionScript. And, as you can imagine, loading of XML was paramount to success in that field. Invariably, I'd encounter code that copied the documentation example and had this useless dummy object with handlers defined (and subsequently struggled to extract information thus loaded).It was simply amazing how regardless of the overall skill of the programmer or the purpose of the applet, the same exact useless object would appear in the same situation -- be it XML socket or XML loaded via HTTP, submitted and parsed by user... it was always there.----Today, I often encounter code like this in unit tests in various languages. Often programmers will copy some boilerplate code from example in the manual and will create hundreds or even thousands of unit tests all with some unnecessary code duplication / unnecessary objects. Not sure why in this specific area, but it looks like programmers both treat these kinds of test as some sort of magic but also unimportant, worthless code that doesn't need attention.----Finally, specifically on the subject of human-readable encoding of byte sizes. Do you guys like parted? Because it's so fun to work with it because of this very issue! You should try it, if you have some spare time and don't feel misanthropic enough for today.

derstander超过 1 年前

I feel like there ought to be a software analogue to that aphorism about models (if it doesn’t exist already) — maybe something like:All code is wrong, but some is useful.

评论 #37675889 未加载

corbezzoli超过 1 年前

Why do you need a 4-line dependency?This is the reason.

评论 #37675211 未加载

Rapzid超过 1 年前

The most impressive suggestion Copilot has given me was a solution to this that used a loop to divide and index further into an array of units..It never dawned on me to approach it that way and I had never seen that solution(not that I ever looked). Not sure where it got that from but was pretty cool and.... Yeah, it gets simple stuff wrong all the time haha.

seeknotfind超过 1 年前

I was surprised to find log implementations are loopless. Cool.<a href="https://github.com/lattera/glibc/blob/master/sysdeps/ieee754/dbl-64/e_log.c">https://github.com/lattera/glibc/blob/master/sysdeps/ieee754...</a>

评论 #37676852 未加载

bradley13超过 1 年前

When StackOverflow was new, it was an incredible resource. Unfortunately, so much cruft has accumulated that it is now nearly useless. Even if an answer was once correct (and many are not), it is likely years out of date and no longer applicable.

meling超过 1 年前

While reading I was thinking why aren’t stackoverflow “mandating” that solutions have tests, so that this problem isn’t left to everyone else, ref. to the comment at the end of the article:Test all edge cases, especially for code copied from Stack Overflow.

nelsonic超过 1 年前

How does the author determine this is the "most copied snippet" on SO? The Question/Answer has only been Viewed 351k times. There are posts with many millions of views e.g: <a href="https://stackoverflow.com/questions/927358/how-do-i-undo-the-most-recent-local-commits-in-git" rel="nofollow noreferrer">https://stackoverflow.com/questions/927358/how-do-i-undo-the...</a> which have definitely been copy-pasted more times. Yes, there may be many instances of this Java function on GitHub. But only because the people doing the copying are too lazy to think about how it works never mind alter the function name. If there's a bug, just update the SO answer and fix the problem. No need to write a lengthy self-promoting post about it.

评论 #37675212 未加载

评论 #37675265 未加载

评论 #37675240 未加载

评论 #37675234 未加载

评论 #37675219 未加载

评论 #37675186 未加载

stmblast超过 1 年前

Well - I suppose it makes sense. SO isn't built for correctness, it's built for upvotes that just depend on whether the people upvoting like the answer or not (regardless of correctness).

totallywrong超过 1 年前

Read: The most common answer to that question from LLMs is flawed.

speak_plainly超过 1 年前

Sounds like someone bumped into Zeno's paradox...<a href="https://www.youtube.com/watch?v=VI6UdOUg0kg">https://www.youtube.com/watch?v=VI6UdOUg0kg</a>

loeg超过 1 年前

Should have just stuck with the loop. You could change the thresholds to 95% of 10^whatever to accommodate the desired output rounding.

ludwigvan超过 1 年前

Plot twist: they were hired by Oracle since they were the author of the most copied StackOverflow snippet (!)

strangesmells02超过 1 年前

just divide by 1000 until x < 1000 and return int(x) plus a map of number of times divided by 1,000 to MB, GB,... string.Its a O(1) operation because of limited size allowed for numeric types

nathan_gold超过 1 年前

I'm curious what answer GPT will return.

评论 #37676013 未加载

评论 #37676289 未加载

评论 #37675844 未加载

golol超过 1 年前

Classic off by 1 :)

paulddraper超过 1 年前

tl;dr When in the 999+ petabyte range, it gives inappropriately rounded results.And the key takeaway is "Stack Overflow snippets can be buggy, even if they have thousands of upvotes."I don't disagree, but is this really the example to prove it.....

dmccarty超过 1 年前

Processors are inherently awesome at branching, adding, adding, shifting, etc. And shifting to get powers of 2 (i.e., KB vs. GB) is a superpower of its own. They're a little less awesome when it comes to math.pow(), math.log(), and math.log() / math.log().Why 300K+ people copied this in the first place shows some basic level of ignorance about what's happening under the hood.[1]As someone who's been at this for decades now and knows my own failings better than ever, it also shows how developers can be too attracted by shiny things (ooh look, you can solve it with logs instead, how clever!) at the expense of readable, maintainable code.[1] But hey, maybe that's why we were all on StackOverflow in the first place

评论 #37680546 未加载

评论 #37678993 未加载

instamail超过 1 年前

Obligatory, my favourite StackOverflow answer of all time: <a href="https://stackoverflow.com/a/1732454" rel="nofollow noreferrer">https://stackoverflow.com/a/1732454</a>

评论 #37676882 未加载

greenhearth超过 1 年前

Pretty awesome stuff. This is what Hacker News is for!

评论 #37766024 未加载

评论 #37677634 未加载