In Defense of Copy and Paste

100 pointsby zacharyvoaseover 12 years ago

24 comments

toomimover 12 years ago

I love copy & paste! I also defended it in this scholarly article: <a href="http://harmonia.cs.berkeley.edu/papers/toomim-linked-editing.pdf" rel="nofollow">http://harmonia.cs.berkeley.edu/papers/toomim-linked-editing...</a> with a video: <a href="http://youtu.be/1wo_7MTdWWI" rel="nofollow">http://youtu.be/1wo_7MTdWWI</a>

评论 #5189518 未加载

评论 #5189601 未加载

stcredzeroover 12 years ago

> This may come across as a straw man argumentBig time. The refactoring in this case was ill advised. When things started getting hairy, it should've been backed out.Piling too much flexibility in one function is a common mistake. A justification for copy/paste it does not make.I worked at a shop with this rule: don't try DRY until you've seen at least three repetitions. I think this saves one from premature refactoring.Another way to put it: Refactor when the code speaks to you, that is when need is evident. Keep the result only if its a significant improvement. Avoid refactoring only because you are enamored of refactoring. (Or enamored of a rule.) Goes for any programming technique/tool, really.

评论 #5189595 未加载

评论 #5190252 未加载

评论 #5192219 未加载

评论 #5189596 未加载

bunderbunderover 12 years ago

That example under the When Tools Make It Worse section - uggghhhhh. Why would anyone actually do that? That isn't DRY refactoring, that's cargo cult refactoring.DRY is not, was never, and should never be about unnecessarily replacing clean, well-factored code with @$2!% shared mutable state. The goal is to normalize your code, not to micro-optimize for keystroke count. No. Nonononononono. Just no.

评论 #5189693 未加载

crazygringoover 12 years ago

Knowing when to refactor is obviously an art, not a science. And with an example as trivial as this, it's not really a very "real-world" example.A lot of the time, you don't even know if two swaths of code are "coincidentally" identical (don't refactor) or identical in a "deep" way (refactor), even when the program is yours -- you just don't know how the program will evolve.In the absence of additional information, I usually refactor only when I see three similar code paths, since by that point a project rarely goes back. Over the years, it's turned out to be a surprisingly good rule of thumb.

评论 #5191656 未加载

评论 #5192415 未加载

mwcampbellover 12 years ago

As I ponder this more, I think it's useful to consider the concepts of simplicity and complecting as articulated by Rich Hickey in his talk "Simple Made Easy". As he explains it, to complect is to braid multiple things together, whereas in a simple system, multiple things are composed. He has often pointed out that simplicity does not necessarily mean fewer things; as I understand it, it's not about how many things they are, but how they interact.In that light, the single flexible tweet list function presented in this post is indeed problematic because it has a few things braided together: a tweet list, a profanity filter, and pagination.So we should be suspicious of repetition, but at the same time avoid complecting.

pjungwirover 12 years ago

One rule I try to follow is to avoid refactoring when the shared code is "coincidental." Perhaps this is another way of expressing what the author says about business logic.I've definitely worked on projects where developers created large, unwieldy, hard-to-grok, buggy abstractions in the name of DRYing code. I'm pretty aggressive about making code DRY, but simplicity and readability are more important.The effort I'll tolerate in pursuit of DRY also varies by language. I've been doing some Android work lately, and I'm finding that things I would have done DRY in Ruby require too much added complexity to make DRY in Java.

评论 #5190757 未加载

评论 #5192041 未加载

BoredAstronautover 12 years ago

Straw man thinks DRY applies to two-line function. Straw man is a straw man. Also, less code > DRY. In fact, less code -> DRY. If refactoring makes for more code, not really DRY. More like taking a principle to its illogical conclusion. Compression is a process of diminishing returns.Although there are certainly times when a factoring two lines into one line is better. Like when it's self-documenting, or when those lines otherwise add noise to part of another function.Sometimes a new function is not the right approach to avoiding repetition. If you can't write a function to adhere to DRY, use a macro or equivalent. In C/etc, macros are wonderful if used well.

评论 #5189733 未加载

michaelfeathersover 12 years ago

One of the things that people don't get about refactoring is that it is not just a matter of extracting things or removing duplication. Sometimes you merge things or re-introduce duplication to get someplace better.When you look at refactoring examples online, they often make that mistake. There's a straight arrow toward a "better solution" but without any backtracking. It's a hobbled view of refactoring.To bring it home, in the blog example, I think is perfectly fine to remove duplication in the way listed as "bad", as long as you reintroduce the duplication when you have a bit of trouble. Much of the time, you're lucky and you don't.

taericover 12 years ago

I really really like this take. Refactoring is usually pitched as something that is completely orthogonal to solving the actual problem you were given. I think too many of us (clearly, I'm projecting) are weary of anyone else going on a refactoring spree because we see it break down things that were just fine separate. Often with only "warm fuzzies" being the actual gain. The progression shown in this post is really really good.

评论 #5189581 未加载

tterraceover 12 years ago

I think the first step the author took on the refactoring path was one I wouldn't take. It breaks the "do one thing" rule and the rest of the post is the pain that naturally follows from having an over-generalized method that tries to do too much.

Chasephover 12 years ago

These articles are a dime a dozen. This popular philosophy isn't right, because look at my poorly coded example of it.You're dry code, is only dry is the laziest of senses, and represents a lousy programmer cluttering the system. A really lousy implementation of any of these programming paradigms would make one side look wrong.In your example, the refactored code would look excellent if it implemented OOP and the Strategy Pattern. The two different feeds can inherit their similarities from the same place, and their differences implemented in separate places. Which feed to produce can be chosen dynamically, rather than one crappy grab-all function.

adrianhowardover 12 years ago

Not directly related - but it's something I come across so often when mentoring newbie devs that I thought I'd mention it in passing just in case anybody has this problem.A pattern I sometimes see with newbies who understand the value of DRY is - as soon as they get to the point when they're about to repeat something or about to copy and paste - they stop themselves and start refactoring to remove the duplication they haven't typed into existence yet. They see adding the code that will produce the duplication as bad / waste.Don't do that.It's hard - because the code that they've not typed or copy/pasted doesn't exist or work yet. It's still in their head.Make the duplication explicit first.Type it out. Copy and paste. Change those two branches so they have exactly the same structure.When you've done that - and everything is working and all tests pass - then refactor the heck out of it.Much simpler, faster and less error prone.

评论 #5192291 未加载

评论 #5192419 未加载

Chris_Newtonover 12 years ago

I expect most of us would agree that a single function should ideally have exactly one main job and do it well.Two functions are really doing the same job, and should probably therefore be combined into a single function, not when their behaviour is the same but when it should always be the same. As the article suggests, that determination is generally more about the software design or domain model than the mechanics of the current implementations.Having said that, there is also a middle ground: create some sort of utility/helper function(s) to contain the code that is the same, coincidentally or otherwise, and then rewrite the two higher-level functions in terms of common helpers for now. If those higher-level functions need to diverge for good reasons later, at least it will be an active decision to separate the behaviours.IME that sort of breakdown is unlikely to be beneficial with very short functions such as the examples here. There’s not enough commonality to justify the overheads of breaking everything up. However, in more realistic code, if you’ve got, say, 80% common operations between multiple cases, there are often some underlying concepts that can be extracted into their own functions. Those then become informatively named building blocks for the original functions.Put another way, you might not want to consolidate the functions’ interfaces if they serve logically distinct purposes, but you can still consolidate some of their implementation details.

jbrainsover 12 years ago

I see a lot of comments here of the type "You have to know when to refactor". I don't do it this way. Instead, I rely on a willingness to undo a refactoring when I see that something else might work better -- and even to undo that when I decide that I've got that wrong.I have no problem extracting as in "WHEN REFACTORING GOES BAD" -- although I might wait for a third copy because removing the duplication -- because I want to see whether a useful abstraction would emerge. On the other hand, as soon as I recognise that one of those copies wants to change in a way that the other does not, I'd simply inline the method and let them diverge. I don't consider this a problem.It seems as though some programmers believe that, one they extract something, it needs to remain extracted. No. It's only "cargo cult refactoring" if you stop thinking.Most importantly, refactoring is experimentation. It's a kind of Mechanical Turk-based genetic programming-oriented style of designing, except that you have heuristics you can follow. That means that you'll go down the wrong path. THAT'S OK! as long as you allow yourself to backtrack. Remember: refactorings are small, reversible design changes. That means not just that one can undo them, but that one is willing to undo them.

dansoover 12 years ago

OK, I'm obviously missing something, and part of the problem is that I'm not a Python programmer so my brain is obviously in "skim-mode".Couldn't the problematic DRY pattern be alleviated by refactoring the following call:<pre><code> filter_profanity = kwargs.pop('filter_profanity') tweets = Tweet.objects.filter(**kwargs) if filter_profanity: tweets = itertools.ifilter(lambda t: not t.is_profane(), tweets) return render(request, template, {'tweets': tweets}) </code></pre> Into something like:<pre><code> def tweet_list(request, **kwargs) ... tweets = get_filtered_tweets(kwargs) ... def get_filtered_tweets(**args) filter_profanity = args.pop('filter_profanity') if filter_profanity etc.... end return tweets end </code></pre> Why does the logic for the Tweet filtering have to be encapsulated in the rendering function?// edit:What might help is if the OP showed how the non-refactored code would look with the profanity_filter and pagination features. I agree that his refactored proposal is confusing...I'm just having a hard time imagining how the non-refactored version would be less so.

评论 #5189445 未加载

评论 #5189529 未加载

sha90over 12 years ago

My only real concern with this essay is that the OP bothered to refactor out the duplication, but didn't bother to refactor his internal refactoring when it got too complicated, instead claiming: "look, now it got messy", threw up his arms, and said there's nothing more that can be done, blaming DRY as the culprit.Except we CAN do something about it.It would have been just as easy to continue refactoring the tweet_list() method to pull filtering, pagination, and profanity checking out into sub methods-- at which point you've built a strong reusable component that can support many more combinations of those extra requirements. So by the time you get more feedback saying, "we need a new page that only shows 5 tweets per page and hides profanity, but does not filter", you can now easily take that reusable component, pass in those options and be done rather than starting from the top because you refused to clean up your internals. That's why we strive for reusable components in the first place.In other words, if the argument is that refactored code is messy, it really means you aren't done refactoring.

njharmanover 12 years ago

Refactoring / DRY to me is not about creating monolithic, generic do anything functions. It decomposing code into layers of abstraction somewhat like mini-"DSL"s. The top level functions are tying together next level "down" helper functions. Which may themselves be higher level tools over something like DB api. More than 3-4 layers is probably a smell.

mwcampbellover 12 years ago

In defense of the single flexible function, I think the hypothetical business requirements are pathological. Or perhaps the hypothetical developer is taking a pathologically literal interpretation of them. Who would want pagination in one view but not another? As for the profanity filter, that should probably be a preference of the currently logged-in user which is applied to all feeds which that user views. (It should probably be enabled when an anonymous user is viewing any feed.)I suppose some developers don't have the freedom of suggesting alternative specified behavior that is nicer to implement. In some cases I have not had that freedom. But in this hypothetical case, when pressed, the person setting the requirements ought to value consistency.My own experience has been that I tend to do copy-and-paste because it's easier, but then regret it later. I don't think I've yet erred too far on the side of trying to follow the DRY principle.

dalyover 12 years ago

I did a self-study on a project that lasted several months. I wrote down everything, including mis-typed characters, grammar errors, syntax errors, and semantic errors. I did a root cause analysis of the result. One general result is that I have a 3% error rate, regardless of activity. In fact I find that the delete key is by far the most important key on the keyboard, representing 3% of all of the characters I type.One observation is the fully 1/2 of ALL the programming errors I made were due to copy/paste. Your mileage may vary but I doubt it.Copy/paste is evil but it is so "low level", like the delete key, that you probably don't even think about it.You may find it worthwhile to do a deep analysis of your personal error rate on some project. It is very enlightening. In fact, we ought to fund studies so we can get industry wide statistics.

einhverfrover 12 years ago

The thing is:"If you are using copy and paste while coding you are probably committing a design error" doesn't conflict at all with what he says. The fact is that copy and paste is the point when one looks and says "is refactoring appropriate here?"One thing I would point out is that premature optimization is the root of all evil. You can get a pretty good sense that if your refactor adds more lines than it deletes and functionality remains the same, that you have added complexity in refactoring which means very likely that you are doing it wrong. This is particularly true if you can't say it is reducing the number of lines of code generally, or compartmentalizing state changes.(This leaves aside the fact that the most pernicious use of copy and paste in the world is "sample code.")

darkchasmaover 12 years ago

If your code is starting to look ugly, refactor it. If your refactoring is looking ugly, stop, you're doing it wrong. If a test breaks because of your refactoring, stop, you're doing it wrong. I call this the Don't be Stupid principal.

评论 #5191115 未加载

jiggy2011over 12 years ago

Isn't this what functional programming is for?You have several pieces of code that follow a very similar structure and logic but perform very different purposes for the program. So you try and generalise the structure of the code?

readmeover 12 years ago

Refactor if doing so will give you an advantage.Copy paste when you aren't sure if the requirements will change. Nothing is worse than building an abstraction only to find out it's useless given this new project requirement and that the two abstractions should really be separate.

darec1over 12 years ago

I don't remember where I read it, but it's good advice:Copy the first time, only start refactoring if you need the code a third time.

24 comments

toomimover 12 years ago

评论 #5189518 未加载

评论 #5189601 未加载

stcredzeroover 12 years ago

评论 #5189595 未加载

评论 #5190252 未加载

评论 #5192219 未加载

评论 #5189596 未加载

bunderbunderover 12 years ago

评论 #5189693 未加载

crazygringoover 12 years ago

评论 #5191656 未加载

评论 #5192415 未加载

mwcampbellover 12 years ago

pjungwirover 12 years ago

评论 #5190757 未加载

评论 #5192041 未加载

BoredAstronautover 12 years ago

评论 #5189733 未加载

michaelfeathersover 12 years ago

taericover 12 years ago

评论 #5189581 未加载

tterraceover 12 years ago

Chasephover 12 years ago

adrianhowardover 12 years ago

评论 #5192291 未加载

评论 #5192419 未加载

Chris_Newtonover 12 years ago

jbrainsover 12 years ago

dansoover 12 years ago

评论 #5189445 未加载

评论 #5189529 未加载

sha90over 12 years ago

njharmanover 12 years ago

mwcampbellover 12 years ago

dalyover 12 years ago

einhverfrover 12 years ago

darkchasmaover 12 years ago

评论 #5191115 未加载

jiggy2011over 12 years ago

readmeover 12 years ago

darec1over 12 years ago

I don't remember where I read it, but it's good advice:Copy the first time, only start refactoring if you need the code a third time.