TechEcho

9 comments

aldanoralmost 8 years ago

So many people don't realize pandas can be horribly slow if you use it "wrong" -- i.e., for computations that don't vectorize in the way that's native for pandas. Also, working with dataframes that contain millions of rows is like playing a Russian roulette -- there's usually many ways to do the same thing in pandas, if you guessed correct you'll wait a minute or two till the computation's done, if you guessed wrong it'll run out of ram, segfault or never finish.For big datasets, I've stopped using pandas myself a few years back for anything other than printing dataframe, datetime index series, doing quick plots, or working with tiny/toy datasets -- in favor of numpy structured/record arrays. It's kind of the same thing, without all the groupby/index fluff, but very fast.Just last week, I've helped my colleague speed up her code (numerical solver for financial data) by more than 100x, the biggest part of it was ditching pandas entirely and using numpy.

评论 #14514323 未加载

评论 #14512963 未加载

farnsworthalmost 8 years ago

<pre><code> But pandas’ magical simplicity makes things like computed columns immediately intuitive: > data['% of total'] = data.amount / data.amount.sum() </code></pre> Is that immediately intuitive? I'm staring at this trying to understand what it's doing. Is the / operator overloaded? data.amount is one particular amount, and data.amount.sum() is the sum of all amounts? Why does the "computed column" property goes on the same data object as the actual data? Maybe it's immediately intuitive if you've used pandas.

评论 #14511350 未加载

评论 #14513365 未加载

评论 #14512488 未加载

评论 #14511454 未加载

tktalmost 8 years ago

For installation of Jupyter, Anaconda works well across all platforms, even most slightly older OSes.<a href="http://jupyter.readthedocs.io/en/latest/install.html" rel="nofollow">http://jupyter.readthedocs.io/en/latest/install.html</a>It does work better for people to install Jupyter with Anaconda, rather than use virtual environments, because there's not the overhead of also having to learn about virtual environments. People tend to think of them as just associated with the class and don't use them as much for their own work outside of the workshop or course.

flyawayalmost 8 years ago

I spend about 8 months of the year teaching pandas to journalism students, and it's a wild ride! Despite some of the iffy syntax and pandas' seeming inability to standardize parameter names, the students seem to grok the workflow much more quickly than wrangling lists and dictionaries in the "normal" world of Python.I know everyone loves the reproducibility Notebooks supposedly bring to the table, but without a doubt my favorite part is the ability to export super-unattractive matplotlib charts as PDF, clean them up in Illustrator, and suddenly find yourself with publication-quality graphics. Knowing you're producing something more than just some numbers to toss in a story can be a strong sell to a lot of folks.

thearn4almost 8 years ago

I really like Jupyter, but somehow I'm not in love with it. Like, every time I fire it up to use it for quick data analysis, I seem to inevitably end up back in sublime + bash, sending plots to disk. Am I the odd one out?

评论 #14511371 未加载

评论 #14511409 未加载

评论 #14511926 未加载

评论 #14512323 未加载

评论 #14514361 未加载

评论 #14513185 未加载

评论 #14511398 未加载

bsderalmost 8 years ago

It is hard to overstate just how ferociously bad the experience of getting Jupyter from blank computer to the equivalent of "Hello world" actually is.

评论 #14511923 未加载

评论 #14514083 未加载

jastralmost 8 years ago

I've found that most of the queries that journalists are trying to run are pretty basic, mostly filtering and histograms. Setting up a virtualenv, dependencies, etc can be tough. And RTFM isn't sufficient for someone getting started. I was surprised that nothing existed for this, so I built it.It has the basics of a Jupyter notebook - filter, sum, average, plot. So far it's attracted a pretty interesting audience including journalists, but also lawyers and consultants.www.CSVExplorer.com

farnsworthalmost 8 years ago

Side note, I googled "pandas" and get a lot of results related to the python library, and very few related to the large mammal. Bing doesn't give me any related to the python library. Google knows me too well.

koolhead17almost 8 years ago

Excellent share.

9 comments

aldanoralmost 8 years ago

评论 #14514323 未加载

评论 #14512963 未加载

farnsworthalmost 8 years ago

评论 #14511350 未加载

评论 #14513365 未加载

评论 #14512488 未加载

评论 #14511454 未加载

tktalmost 8 years ago

flyawayalmost 8 years ago

thearn4almost 8 years ago

评论 #14511371 未加载

评论 #14511409 未加载

评论 #14511926 未加载

评论 #14512323 未加载

评论 #14514361 未加载

评论 #14513185 未加载

评论 #14511398 未加载

bsderalmost 8 years ago

It is hard to overstate just how ferociously bad the experience of getting Jupyter from blank computer to the equivalent of "Hello world" actually is.

Teaching Pandas and Jupyter to Northwestern journalism students

9 comments

Teaching Pandas and Jupyter to Northwestern journalism students

9 comments