How much information is too much information? (2022)

89 pointsby gus_leonelalmost 2 years ago

20 comments

klausercalmost 2 years ago

Interestingly, I had the exact opposite reaction. I found the second function harder to reason about because it heavily relies on mutation and the fact that individual loop iterations commute with respect to one another. (To understand what the final value of `min_value` is, you first have to understand that `<` being transitive means that the iteration order doesn't matter)The first function, on the other hand, has only static assignments (one name is only ever bound to a single "definition" at any one time). In terms of working memory, that means that each variable in the first function only occupies a single "slot". In the second function, the mutable `min_value` variable necessarily occupies multiple "slots", one for each point in the program where the variable could change. So you'd have to keep track of a "min_value[0] = 9999", a "min_value[1] = number (if number odd && < min_value)" and _maybe_ a "min_value[2] = min_value (if number even || > min_value)".

评论 #36680325 未加载

评论 #36679712 未加载

评论 #36679791 未加载

评论 #36679973 未加载

评论 #36698991 未加载

zellynalmost 2 years ago

I agree with all of this, except the part where they assume extracting a function is essentially free.Our profession seems to consistently underestimate or even ignore the complexity of interactions between many little pieces.I would rather work on a module with 10 clearly-written 500-line functions than one with 500 clearly-written 10-line functions. Keeping in my mind the complex and variable interactions of a whole constellation of tiny functions is much harder than grinding my way through understanding a few more complex pieces of code. Have you ever worked in a codebase where the template method pattern was combined with too many levels of inheritance and each subsequent method is implemented at a different level of the hierarchy?Likewise for microservices and lambdas. The organizational complexity of an app atomized into 100 lambdas is mind-boggling, because you have to understand the implicit and emergent orchestration of distributed state as those lambdas transition from one to another.All that said, I definitely like this metric better than most others for the static complexity of a single function.

评论 #36680824 未加载

评论 #36680583 未加载

UglyToadalmost 2 years ago

It feels like their calculation missed 2 crucial confounders, stack depth and type tracking.1) Stack depth. In their refactored example if I truly want to understand the code I need to now store everything I was thinking about in the current function and jump to their new `is_paid_today` function, then into each of the 3 dependent functions, I'd say as finger in the air type estimates each stack frame should add 2 * depth and then some multiplier for breadth or something. This is why I prefer big functions like the initial state read from top to bottom with no indirection, abstraction or surprises where reasonable. (curly braces wouldn't go amiss to give visual indicators of scope...) This is completely counter to 'clean code' dogma but clean code dogma is... bad.2) Type tracking. To reignite this holy war if I have to track the type information mentally then I'd say every parameter and variable has an additional +1 working memory overhead. What is employee, what is employee_database, what is `is_end_of_month` and where on earth is it coming from? Try as I might I just cannot understand people who onboard to big codebases without type hints at least, y'all are a different breed, you should work as like super-rememberers.

评论 #36680308 未加载

评论 #36680434 未加载

Ensorceledalmost 2 years ago

I have a hard time getting developers to follow this type of advice and a lot of developers will "clean up" and "optimize" the example code in introducing_variable_version by eliminating the paid_today variable back into a single if statement.I recently had a developer "improve" a security sensitive series of checks (think a series of if statements: if logged in, if not inactive, if in required group, etc.) by merging the statements into a single if statement that, unsurprisingly, broke the check and left a security hole.

评论 #36679509 未加载

评论 #36680275 未加载

评论 #36680706 未加载

bluetomcatalmost 2 years ago

The "rules" are overly defined with respect to the static elements of source code. Cognitive load in understanding code is not linearly tied to the number of variables, function calls, conditional checks, etc.For example, "volume = width * length * height" is less demanding than, say, "employee_qualifies_for_bonus_after_midterm(emp)". I think it comes down to limiting the spatial scope of variables and fragments of code, naming things in a way that makes sense to most people, and containing complexity locally. It's also called "abstraction".

评论 #36680634 未加载

tra3almost 2 years ago

Complexity related -- "Don't wake up the programmer" (<a href="https://alexthunder.livejournal.com/309815.html" rel="nofollow noreferrer">https://alexthunder.livejournal.com/309815.html</a>).This crystallized for me why you shouldn't interrupt devs when they are working and keep interruptions to a minimum.Briefly, the author asserts, that writing software is like sleeping. For a lot of people getting to sleep takes a while and if you wake them up even for a minute, they have to go through the 'getting to sleep process' again and it's non trivial.Software complexity is the same, it takes a while to go through the cognitive exercise of grasping what the code does. If you interrupt me, I lose the context and have to go through the "code loading" steps again..I tried to explain this to my wife, unfortunately she's a rare breed that can fall asleep on a dime so this explanation did not make quite the impact that I was hoping for.

pkkmalmost 2 years ago

I don't see any code being improved in this article. The author does the same thing I've noticed Clean Code adherents do: he focuses on the busywork of splitting functions into tiny pieces while missing the actual place that requires the most mental effort to understand and is the most likely to become a source of bugs. In this case, it's the complex boolean expression that relies on Python's operator precedence. It could be rewritten like so:<pre><code> if ( ( has_passed_probation(employee) and is_paid_monthly(employee) and is_end_of_month ) or ( is_paid_weekly(employee) and is_end_of_week ) ): </code></pre> to make it clearer that the has_passed_probation check only applies in the monthly case.

TexanFelleralmost 2 years ago

Having low working memory combined with my working memory slots being overwritten my random digressions my brain decides to pursue, AKA ADHD, reducing the amount of state I need to track is one of the most significant factors for productive coding. I guess that's why I find FP style code that eschews mutable variables and collections and anything that breaks referential transparency(AKA the ability to understand code locally, without surrounding context) appealing.

JackFralmost 2 years ago

The big difference is that the first function may be be wrong, but the second is quite obviously wrong.The smallest odd of a list of numbers > 9999 is not 9999.The smallest odd of an empty list is 9999 is not 9999.The smallest odd of a list of even numbers is not 9999.The first function does seem easier to understand but relies on global state (is_end_of_week, is_end_of_month) and side effecting functions (run_payroll).Neither of these functions should survive code review.

drittichalmost 2 years ago

I'm still left wondering whether the code is correct, in that an employee that is paid weekly does not require passing probation to get paid. Is that the correct logic, or is it the (very common) error of forgetting an extra set of parentheses when using OR?Given that there would probably be more complexity in the real code I would create separate functions expressing the rules for each employee type.

评论 #36679807 未加载

评论 #36679759 未加载

评论 #36679864 未加载

Izkataalmost 2 years ago

First one is far far easier for me to understand for two reasons:* The table-like structure of the conditions instead of nesting like the second one. The two-dimensional relationship makes it trivial to reason about - understanding it works at a higher level than the individual variables.* The body of the conditional is entirely self-contained and can be understood separately from the conditional instead of having to understand them both at the same time.The refactor they suggest at the end only works so simply because of my second reason here. It's kind of an improvement (self-contained logic that can be reused), but kind of not (now you have to look in two locations to actually understand what it's doing), but either way isn't really necessary if you read code at a higher level than token-by-token.

评论 #36680635 未加载

heikkilevantoalmost 2 years ago

I believe a lot of this is affected about how familiar idioms are used. The second example simplifies to me as something like "Loop to find the smallest", "check parity". The thing that seemed to take most of my cognitive capacity was the unstated assumption that 9999 was clearly bigger than any of the possible numbers.The first example did have a bit more complex condition, but it was fairly logical and not hard to understand. Again what bothered me was that there must have been a hidden way for the result of run_payroll(name, employee.salary) to pass its result to write_letter(name)

评论 #36680121 未加载

tialaramexalmost 2 years ago

The second function bothers me because it has this arbitrary looking value, presumably as a sentinel, and that's a hidden requirement of the function. This function is actually minimum_odd_number_or_9999 for some reason, and chances are nobody actually wants that. Instead languages should encourage distinguishing None from actual answers - and then sure as an optimisation you'd consider using a sentinel to achieve that mechanically, but it doesn't alter the program meaning.On the other hand, in this style the first program could have any amount of hidden traps.

jacknewsalmost 2 years ago

"We can then separate out that complexity even further if we move the determination of the value of paid_today into its own function."Ouch, and this immediately makes it less obvious what's going on, because you have to look up the function. In this case, the name is self-documenting, but that's far from always the case. And even then the extracted function seems to conflate 2 concepts, 'pay day', and 'eligible to be paid', and the code even looks suspect/buggy; which is more clearly unambiguous:<pre><code> (one and two or three) (one and (two or three)) </code></pre> This kind of extraction is only useful if you need to re-use the encapsulated calculation IMHO, otherwise just comment each term, or use something more powerful than "if and or then", eg decision tables, etc.

cborensteinalmost 2 years ago

Love the concept of a working memory score for code.> the number of distinct pieces of program state that a developer needs to keep in their head when analysing a function and the impact this has on the understandability and maintainability of code.The same concept applies not just to writing code, but to any context that you need to load up in order to do your work. E.g. context when jumping into a meeting or when reviewing a new project spec.In those cases, one of the key things is being able to offload what you don't need in your working memory right now to some other solution (usually notes). More on that here <a href="https://www.stashpad.com/blog/working-memory" rel="nofollow noreferrer">https://www.stashpad.com/blog/working-memory</a>

jagged-chiselalmost 2 years ago

> the top function is fairly obviously harder to understandWhat? It’s one conditional branch, and in that branch there are three “high-level” operations, all named very well (assuming they’re correct names.) What’s harder to understand here?I spent much longer understanding what was going on in the second function.Further, the variable name “paid_today” is terrible because the result of the condition does not tell us that the employee was indeed paid, but rather that they should be paid. They’re not paid until after the code in the conditional branch runs.Extracting this information to its own function is certainly an improvement (even if the function name still isn’t great.)I would have liked the article to spend more time making the second function take less time to understand.

TRiG_Irelandalmost 2 years ago

I've certainly come across the magic number seven plus or minus two before, in study of simultaneous interpreting. The interpreter is usually a sentence or two behind the speaker. The trick, they say, is to increase the size of the chunks, as you can't do much about the number of them. Try to process a sentence in clauses, not in words. (You need to do that anyway, really, as the grammar of the languages may not match up.) I suspect that something similar applies here.

vander_elstalmost 2 years ago

Are there any implementations of this? So that it's possible to measure it for a given piece of code?

quantum_statealmost 2 years ago

is there a tool to scan the code and output the working memory metrics for a given cognitive context such as line, etc.?

hmmokidkalmost 2 years ago

Functional Programming solves this problem.