TechEcho

9 comments

rathleonover 2 years ago

The problem with this snippet isn't really the chaining; it's all the inlining. All the lists, and the many lambdas used, could be variables. Does this approach make it "professional code"?The responses seem out of context, too:>David: What's the elevator pitch for writing pandas code the way that you do?>Matt: One common thing that you'll see in the data science world is this notion that there's like Untitled1.ipynb and Untitled2.ipynb[...]. My goal is to help with that so (...) you have Analysis_for_ClientA.ipynb and that's the only notebook you have. And you can come back to it tomorrow and pick it up where you left off and you're going to be productive. Your code will be easier to read[...].This is a tweet. Filenames aren't even argued. This doesn't answer the interviewer's question either. Writing code != naming files.>David: What is it that separates beginner pandas code from professional pandas code?>Matt: I would say that if you want to write good pandas code (...) you should know how to write lambdas. You should know how to do list and dictionary comprehensions. Dictionary unpacking (...) is super useful in pandas world.Absolutely. But professionals use variables, too. Possibly even more so.

hprotagonistover 2 years ago

> In my 20-plus years of working with data, I have multiple steps and I don't care about the intermediate steps.Oh boy, do i care about every single intermediate step though!Especially in pandas, where we play "where's the NaN" all the damn time.

LarsDu88over 2 years ago

I've done chaining myself and seen people do it as well. The folks writing these massive functions may think they are gurus, but it makes functions virtually impossible to debug in prod. It flies against the wisdom of "make your functions small"I think is one area where pandas and Polaris can be improved. How do you write long chains and slot in breaks and testing?

_dwtover 2 years ago

I had a whole rant queued up on "Pandas and its consequences have been a disaster for the human race" (well, at least for newbie programmers), but I think instead I want to focus on the damn dictionary splats. I just don't get it - it's pure "clever" code in the pejorative Dijkstra sense. It's hard to edit, it's hard to typecheck. Why not pay the very low whitespace tax to give each key/value pair its own longhand line:<pre><code> .astype({ 'central_air': bool, 'ms_subclass': 'uint8', ... }) </code></pre> Now if, say, ms_subclass and overall_qual need different types, that's an easy diff to read. Ah, but I suppose that wouldn't be as Twitter-friendly.

评论 #33438827 未加载

mint2over 2 years ago

Random lists of strings are hard to decipher. What is that set of values supposed to represent? And it interrupts the flow of figuring out what’s going on.I prefer assigning lists like that to informatively name variable rather than have leave them the subject of speculation. It’s easier yo add add clarifying comments that way too.In sql or pandas, long lists of values not broken up are hard to read. It’s easy to scan down a single value on each row, not random length values spread randomly across the screen.Also That is chaining far too much in a single go

malsheover 2 years ago

I personally don’t like method chaining in Pandas because it makes troubleshooting difficult for me. On the other hand I love piping functions in tidyverse in R. I think there are a few libraries in Python that bring pipes to Pandas. I haven’t used any though so can’t comment on their usefulness.Edit: Here is a library that brings pipes to pandas <a href="https://github.com/pwwang/datar" rel="nofollow">https://github.com/pwwang/datar</a>

yablakover 2 years ago

I love the function chaining. It's basically functional programming with "immutable" intermediates (yes I know they're not really immutable, but we don't modify them in place).Another good example of this style is tf.data pipelines. Also a very nice API.

cuteboy19over 2 years ago

Why does pandas code often feel ugly and clunky compared to the equivalent SQL? Is there no better way to do this?

评论 #33438118 未加载

评论 #33437669 未加载

extasiaover 2 years ago

Quite frankly this is unreadable and unmaintainable code.He doesn't articulate any of the virtues of it either, aside from some hand waving about 'memory' that doesn't get fleshed out.

9 comments

rathleonover 2 years ago

hprotagonistover 2 years ago

LarsDu88over 2 years ago

_dwtover 2 years ago

评论 #33438827 未加载

mint2over 2 years ago

malsheover 2 years ago

yablakover 2 years ago

cuteboy19over 2 years ago

Why does pandas code often feel ugly and clunky compared to the equivalent SQL? Is there no better way to do this?

评论 #33438118 未加载

评论 #33437669 未加载

extasiaover 2 years ago

Quite frankly this is unreadable and unmaintainable code.He doesn't articulate any of the virtues of it either, aside from some hand waving about 'memory' that doesn't get fleshed out.

Method Chaining in Pandas: Bad Form or a Recipe for Success?

9 comments

Method Chaining in Pandas: Bad Form or a Recipe for Success?

9 comments