Beginner's Guide to Abstraction

243 pointsby jesseduffieldalmost 5 years ago

21 comments

chowellsalmost 5 years ago

What's the source of this idea that "abstraction" means "combining duplicated code"? An abstraction is a transformation between models that strips away information not essential to the target model. This is an idea firmly planted in semantics, not code coincidence.The first example in the article is a nice demonstration of this happening well. You are transitioning from a model where geometry is unknown to one where at least simple geometry is known. It may not be the most compelling model change ever, but it at least captures the key point - there is a semantic operation going on: calculate the volume of a sphere. The actual formula for doing so isn't important at the level you want to think about your code, so replacing the formula with a function call simplifies the model in which you're working.Contrast that with the example of things going wrong in the next part, with the bad "average" function. What unnecessary details are being removed there? Being sure you're calling that function correctly actually takes more work than calculating an average without it. That's not an abstraction, it's an indirection. You still need to track it down and read the code to understand it. That's not something you have to do with sphere_volume in the preceding part.So how do you know whether duplicated code represents an opportunity for abstraction? You start thinking in terms of the semantics you want to be using. Is the code duplicated because it's doing the same thing, semantically, in your target model? Well then that's an opportunity for abstraction. Or is it just duplicated because it happens to share an implementation? Well then that's just a coincidence. Don't try to share code.Not to suggest that this is an easy test, of course. It's very possible that two things might be different instances of a common problem that you're unaware of, and you have no idea that they share code because they actually are the same. That's ok, and there's always more to learn. But I think if people put more thought into what abstraction is and why it exists, the questions of when and how to use it fade away.

评论 #23738657 未加载

评论 #23738140 未加载

评论 #23738475 未加载

评论 #23739930 未加载

评论 #23739824 未加载

codemonkey-zetaalmost 5 years ago

I'm surprised I'm the first to bring this up, but the "over-abstract" example feels more like "over-specified". The actual operation being defined works on the same level of abstraction, since it defines precisely the same operation. The first has just specified extra tweakable aspects of its execution via the argument list. I'm not saying it's not bad, I just don't think it's because it is "too abstract" compared to the simpler solution.Abstraction in my mind is fundamentally about the "higher-orderness" of a thing. These two average methods are just as abstract as the other, since one is not a higher-order operation than the other. I would use the word over-abstract if one was to write a program modeling a dog-walking business (a very specific thing), by writing a system which models actions on entities (a very abstract thing), where walking is an action which may involve one or more entities, and dogs, humans, employees, and customers are all entities. If the core thing you want to do is just a single concretion of the system that you actually built, then you "over-abstracted". I feel like we should not discourage the practice of abstraction, since that's our business. I literally get paid to think about the real world in terms of abstraction and write it up into a computer. Young engineers should not be taught to fear "over-abstraction".

评论 #23737583 未加载

评论 #23737754 未加载

评论 #23738022 未加载

评论 #23737859 未加载

kevsimalmost 5 years ago

One of the nightmares I've experienced time and time again in large mature codebases is incomplete abstractions. Some developer gets a great idea of how to abstract something away, defines the plumbing needed for said abstraction, and sets to work going through the codebase bit by bit, moving code over to this new abstraction.But then, they leave. Or they change projects or they just lose their enthusiasm for this major refactor. And you're left with a half baked abstraction in the code. Then another developer comes along, and another, and another and before two long you've got a spaghetti mess of incomprehensible "abstractions".I was told by a friend once that the iOS app for Facebook has a whole bunch of implementations encapsulating what a "form" is in the app. Many developers came and went with their own ideas of what that abstraction should look like, but none became the one abstraction to rule them all.

评论 #23737803 未加载

评论 #23737997 未加载

pierremenardalmost 5 years ago

I've learned the hard way that perfect abstractions don't exist. The mathematician in me wants to find the "most elegant representation" of a given problem, but when I give in to that urge, I often end up with a god function that takes `n` boolean flags that toggle the behavior slightly for different cases.Why does this happen? I think a partial explanation is in this Nietzsche essay [1], where he says, "every concept arises from the equation of unequal things" — in other words, the abstractions of the world were built bottom up in our heads, and the Platonic essence of things is just a fairy tale we tell young programmers so they can sleep easily at night.1. <a href="http://nietzsche.holtof.com/Nietzsche_various/on_truth_and_lies.htm" rel="nofollow">http://nietzsche.holtof.com/Nietzsche_various/on_truth_and_l...</a>

评论 #23737712 未加载

mplanchardalmost 5 years ago

There was a great discussion here on HN a few months ago about DRY [0] and how developers generally get the concept wrong. Specifically, it’s not about removing duplicated code. It’s about ensuring there’s a single source of truth for a given piece of knowledge in your application. For example, if you’re calculating interest on purchases all over the place and hard coding the rate everywhere, it should be unified into a single function or method, so that there’s one source of truth for calculating interest. If you just have some code that looks similar but doesn’t represent “knowledge” that is being duplicated, DRY does not apply. Some folks in the comments mentioned they liked to use SPoT as an acronym that’s a bit clearer, and I’ve been using that since then in code reviews.[0]: <a href="https://news.ycombinator.com/item?id=22329787" rel="nofollow">https://news.ycombinator.com/item?id=22329787</a>

foxtr0talmost 5 years ago

I think this definition of abstraction is lazy. Abstraction is indeed hard to define precisely, but it may be more accurate to describe it as the process of implementing interfaces that are conceptually familiar to users, often through metaphor, such as the unix "pipe".Creating the "right" abstraction is _not_ the process of bundling up repeated code and only questioning how "abstract" it should be, it is the process of creating an interface that is familiar and conceptually easy to grasp. This is done through naming, comments, the use of metaphors, etc. From this viewpoint there can be many correct levels of abstraction, some more useful than others. We should aim to create good abstractions at all layers of code, even if finding abstraction bliss is unattainable.

评论 #23736750 未加载

评论 #23737010 未加载

评论 #23738327 未加载

评论 #23737092 未加载

hackeryogialmost 5 years ago

Well written article> DON'T BE AFRAID TO DISMANTLE THE WRONG ABSTRACTIONCouldn't agree more with the statement, though I don't completely agree with the author's suggestion to copy paste. Duplicating code _is debt_. It may help us go faster now, but it'll almost inevitably come back to bite. It is manageable if 1/2 people do it 1/2 times - definitely not manageable if 5/6 people do it 5/6 times.I believe the general hesitation of not touching a piece of code (or, getting by with that optional param) is due to the fear of fucking things up. Having your code test covered gives an amazing amount of confidence to rip apart old abstractions to yield newer ones that serve the purpose of the _current code_. To me, this route is more preferable to duplicating code.Even with the best of intentions, Hacking an abstraction with that one optional parameter is inevitable. Tests help in our ability to repay that debt faster - on time & in full.Basically they make all abstractions a lot cheaper - easier to write and easier to throw away. Thereby solving the problem of having a 'wrong abstraction' too early.

评论 #23737935 未加载

Ozzie_osmanalmost 5 years ago

> Why is it a good idea to abstract the formula for a sphere's volume into its own method? Because if mathematicians ever found out they got the formula wrong, you would want to go through all the places in your code that you used the formula and update it to be correct. That is, we know ahead of time that we want the code to be in lockstep.This is actually not the main reason you'd want to abstract, and I think the whole article kind of gets it wrong. The main reason to abstract is not to keep code DRY, but to "abstract away" things that are not important in a certain context (or layer). You want to put the formula for the volume of a sphere aside when it's not relevant to what the code at hand is doing and would get in the way of trying to change or understand that code. For example, a really strong case for abstracting that formula is if you're writing code that is calculating the volume of several shapes.Yes, code duplication is often a sign that you've screwed up your abstractions (ie they correlate), and creating the right abstraction will often make your code more DRY, but it's the means, not the end. The end is code that is easy to read, understand, and change.My high-level "sniff test" is essentially rubber-ducking (try to explain to yourself, someone else, or a duck) what the code is doing. For instance, you might explain a function called calculate_remaining_space_in_box:1) we get the volume of all shapes (including spheres) in the box2) we get the volume of the entire box3) we calculate the difference between themIn that explanation, you realize there's really no extra benefit to a reader of that code at that level to knowing the exact formula for the volume of a sphere (or any shape for that matter).There are, of course, other signals to measure whether you've abstracted correctly beyond just code duplication. For instance, and the Single Responsibility Principle is a good example (<a href="https://blog.cleancoder.com/uncle-bob/2014/05/08/SingleReponsibilityPrinciple.html" rel="nofollow">https://blog.cleancoder.com/uncle-bob/2014/05/08/SingleRepon...</a>). Code that changes for the same reason should usually be grouped together, and code that changes for different reasons (or at different frequencies) should be grouped apart. But again, this is in service of the end goal: making code easy to read and change.

bokwoonalmost 5 years ago

One of my favourite abstraction advice comes from this article: <a href="https://blog.carlmjohnson.net/post/2020/go-cli-how-to-and-advice" rel="nofollow">https://blog.carlmjohnson.net/post/2020/go-cli-how-to-and-ad...</a>."You want one layer to handle user input and get it into a normalized form. You want one layer to do your actual task. And you want one layer to handle formatting and output to the user. Those are the three layers you always need."The "do the task" layer can be abstracted again further. But starting it off as a monolithic layer, separated from input and output, is always the right call.

xcskier56almost 5 years ago

This is the sort of article that I could have really used at about year 1.5 of my programming career. I’ve learned many of these lessons the hard way and resonate/agree with the examples here. It would have been really nice to have read this years ago and not have to hit my head on quite so many sharp corners to learn.You invariably have to hit hour head sometimes, but I hope clearly written articles with understandable but not completely contrived examples like this one reduce the head knocks for some people

评论 #23736342 未加载

l0b0almost 5 years ago

Nicely put! Most best practice articles end up reading like dogma because they only ever show clear-cut cases where the best practice applies. Augmenting the plain good/bad examples with good/bad depending on the situation examples seems like a great way to avoid that.

halaylialmost 5 years ago

IMHO one of the best indicators that I've nailed the abstraction is when I am able to use/re-use it in many places.A concrete and easy example is when you're developing a new software and building your common utils/libs. If you've nailed the abstractions, you'll notice that you're able to expand the libs by frequently reusing/leveraging other libs you've built.Abstractions come to exist from the requirements. IMO the key to a good abstraction is to be able to dissect a requirement into smaller requirements that you are familiar with and have already solved and have a solid understanding of them.Metaphorically speaking, bad abstractions are ones that converts a requirement into a new polygon shape, and good abstractions are ones that dissect a requirement into one or more shapes that we are all familiar with (circle, square, rectangle, rhombus etc..).The difference between a polygon and common shapes is that no 2 polygons will look alike unless the requirement is exactly the same and I'd argue that a developer will create a new polygon if asked to solve the requirement twice.When a new requirement comes in, it's common to start implementing it right away. Create one or more classes with names that maps to the requirement, add few methods etc and voila you have a new polygon shape.The key point here is to dissect the requirement into sub-requirements that look like familiar shapes (problems you've previously solved). Every once in a while you'll end up creating a polygon here and there for a sub-requirement, which get refactored over time to a known shape.A good developer can quickly see the familiar shapes that the requirement is hiding behind instead of creating a new polygon shape.

searchableguyalmost 5 years ago

This article is good (although they could work on examples a bit more). One thing I have really found useful when working with elixir is that it gives you a way to abstract common patterns or add more use cases by differentiating based on arity and pattern matching.it's easy to write the average function in the article in this way.<pre><code> def average(arg) when Enum.all(arg, &is_string/1) do # implementation end def average(arg) when Enum.all(arg, &is_integer/1) # for the int </code></pre> which would be a more proper abstraction as it hides details of the type of your data.or using pattern matching in the arguments.<pre><code> def shape("circle", ...) do # implementation end def shape("square", ...) do # different implementation end </code></pre> When you have map as an argument, you can do<pre><code> def is_good(%{ hn: HN }) do IO.puts "#{HN.someprop} is good" end def is_good(%{reddit: Reddit}) do IO.puts "#{Reddit.someprop} is bad" end def multiply_on_two_numbers_otherwise_square(a), do: a * a def multiply_on_two_numbers_otherwise_square(a, b), do: a * b </code></pre> My examples are trivial but this really gives you some awesome refactoring powers.

imvetrialmost 5 years ago

I want to share few things I learnt while making <a href="https://github.com/imvetri/ui-editor" rel="nofollow">https://github.com/imvetri/ui-editor</a>.It abstracts component development for frontend hiding details about framework.I applied DRY principle on code that we write. Framework syntaxes are a repetitive hardcode that I tried to abstract.Pragmatic programmer is the book referred to me by a friend of mine and it definitely works.!

deltron3030almost 5 years ago

Kinda sad that we can't focus on "concrete designs" and have to deal with high costs of rewrites and therefore "manual organization" and finding those abstractions. If rewrites wouldn't be costly it just wouldn't make much sense to compose an architecture manually.Parametricism is slowly taking over other industries like architecture and industrial design. In essence it's automatic rewrites and programs finding the right abtractions/compositions/organizations based on given parameters, where the designer job is more in the actual problem domain, providing the right paramters to the program and selecting the most promising outcomes of that automated process.The web moving to site generators and serverless is maybe a glimpse of a future with dynamic site generators, where the generators then get much smarter and responsive to input parameters and surrounding contexts.

rahulmaxalmost 5 years ago

I'm a designer. Just came here to say, this is so much inline with the "right level of abstraction" in graphic design:<a href="https://computersciencewiki.org/index.php/File:Abstract_heart.png" rel="nofollow">https://computersciencewiki.org/index.php/File:Abstract_hear...</a>

kovacalmost 5 years ago

I'd argue that your definition of an abstraction resembles wrappers (which IMHO is the weakest form of abstraction) rather than abstraction in general. I think it's better to take it as modelling a complex system in a simple way to solve some specific problem by stripping the unnecessary details. For example, design patterns are abstractions but I don't think they all qualify as simply collecting a larger interface into a smaller one. Similarly, a virtual machine process like JVM is an abstraction for the hardware details which is also a lot more than simply reducing the size of the hardware interface. Still, a useful article. Thanks.

评论 #23738043 未加载

luordalmost 5 years ago

In the notices example, I arrived at number one for answer because of a simpler evaluation: it's less code.One of the principles I follow is "the best code goes unwritten" so a good rule of thumb for a good abstraction, for me, is if it reduces (noticeably) the total amount of code. Conversely, if abstracting just increases the total code, I call it an indirection and avoid it.

mlthoughts2018almost 5 years ago

One hard fought lesson I’ve learned over the years is that copy/paste is often a very good solution. If the downside is that a developer has to manually spray a change to 50 different locations with the same copy/pasted implementation snippet, that’s really fine. Even for 500 or possibly 5000 locations is fine, with a good editor available or other tooling. Testing these changes for correctness is easy.Meanwhile the cost for getting an abstraction wrong is often far worse. And it’s really easy to get wrong because abstractions by their nature are always built based on yesterday’s data plus good intentions. People are arguing about subjective theories and unmeasured concepts of extensibility, wasting time arguing about Liskov Substitution Principle, SOLID, dependency injection, type system design patterns, etc., but it’s mostly junk that just adds code bloat.Obviously there are other solutions that don’t require premature abstraction or copy/paste 1000 times (like simple module functions as the core unit of reusability, or a macro system for code injection). But the point is copy/paste gets a bad wrap. It’s simple, straightforward, easy to automate, easy to test, and adds no extra concepts to the code.“Functional core, imperative shell” is the best advice I’ve found. Ruthlessly avoid object orientation, and when you need it, stick to extremely shallow inheritance. Make everything a module function, and when you write data structures, don’t give them member function logic for their core functions (like search, sort, add items, remove items), rather create functions that accept data structure instances as arguments and perform these operations with no class-like internal state.

评论 #23737557 未加载

评论 #23736251 未加载

评论 #23737421 未加载

评论 #23736313 未加载

评论 #23737033 未加载

jyriandalmost 5 years ago

In a perfect world all my abstractions would be small and self-contained packages, that i could import as a library without any dependencies.

bvrmnalmost 5 years ago

My first advise for beginners is not to start a new code with classes. It allows to see shared state patterns after they build some data model and extract it to real abstractions.