TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Teaching Pandas and Jupyter to Northwestern journalism students

86 pointsby palewirealmost 8 years ago

9 comments

aldanoralmost 8 years ago
So many people don&#x27;t realize pandas can be horribly slow if you use it &quot;wrong&quot; -- i.e., for computations that don&#x27;t vectorize in the way that&#x27;s native for pandas. Also, working with dataframes that contain millions of rows is like playing a Russian roulette -- there&#x27;s usually many ways to do the same thing in pandas, if you guessed correct you&#x27;ll wait a minute or two till the computation&#x27;s done, if you guessed wrong it&#x27;ll run out of ram, segfault or never finish.<p>For big datasets, I&#x27;ve stopped using pandas myself a few years back for anything other than printing dataframe, datetime index series, doing quick plots, or working with tiny&#x2F;toy datasets -- in favor of numpy structured&#x2F;record arrays. It&#x27;s kind of the same thing, without all the groupby&#x2F;index fluff, but very fast.<p>Just last week, I&#x27;ve helped my colleague speed up her code (numerical solver for financial data) by more than 100x, the biggest part of it was ditching pandas entirely and using numpy.
评论 #14514323 未加载
评论 #14512963 未加载
farnsworthalmost 8 years ago
<p><pre><code> But pandas’ magical simplicity makes things like computed columns immediately intuitive: &gt; data[&#x27;% of total&#x27;] = data.amount &#x2F; data.amount.sum() </code></pre> Is that immediately intuitive? I&#x27;m staring at this trying to understand what it&#x27;s doing. Is the &#x2F; operator overloaded? data.amount is one particular amount, and data.amount.sum() is the sum of all amounts? Why does the &quot;computed column&quot; property goes on the same data object as the actual data? Maybe it&#x27;s immediately intuitive if you&#x27;ve used pandas.
评论 #14511350 未加载
评论 #14513365 未加载
评论 #14512488 未加载
评论 #14511454 未加载
tktalmost 8 years ago
For installation of Jupyter, Anaconda works well across all platforms, even most slightly older OSes.<p><a href="http:&#x2F;&#x2F;jupyter.readthedocs.io&#x2F;en&#x2F;latest&#x2F;install.html" rel="nofollow">http:&#x2F;&#x2F;jupyter.readthedocs.io&#x2F;en&#x2F;latest&#x2F;install.html</a><p>It does work better for people to install Jupyter with Anaconda, rather than use virtual environments, because there&#x27;s not the overhead of also having to learn about virtual environments. People tend to think of them as just associated with the class and don&#x27;t use them as much for their own work outside of the workshop or course.
flyawayalmost 8 years ago
I spend about 8 months of the year teaching pandas to journalism students, and it&#x27;s a wild ride! Despite some of the iffy syntax and pandas&#x27; seeming inability to standardize parameter names, the students seem to grok the workflow much more quickly than wrangling lists and dictionaries in the &quot;normal&quot; world of Python.<p>I know everyone loves the reproducibility Notebooks supposedly bring to the table, but without a doubt my favorite part is the ability to export super-unattractive matplotlib charts as PDF, clean them up in Illustrator, and suddenly find yourself with publication-quality graphics. Knowing you&#x27;re producing something more than just some numbers to toss in a story can be a strong sell to a lot of folks.
thearn4almost 8 years ago
I really like Jupyter, but somehow I&#x27;m not in love with it. Like, every time I fire it up to use it for quick data analysis, I seem to inevitably end up back in sublime + bash, sending plots to disk. Am I the odd one out?
评论 #14511371 未加载
评论 #14511409 未加载
评论 #14511926 未加载
评论 #14512323 未加载
评论 #14514361 未加载
评论 #14513185 未加载
评论 #14511398 未加载
bsderalmost 8 years ago
It is hard to overstate just how ferociously bad the experience of getting Jupyter from blank computer to the equivalent of &quot;Hello world&quot; actually is.
评论 #14511923 未加载
评论 #14514083 未加载
jastralmost 8 years ago
I&#x27;ve found that most of the queries that journalists are trying to run are pretty basic, mostly filtering and histograms. Setting up a virtualenv, dependencies, etc can be tough. And RTFM isn&#x27;t sufficient for someone getting started. I was surprised that nothing existed for this, so I built it.<p>It has the basics of a Jupyter notebook - filter, sum, average, plot. So far it&#x27;s attracted a pretty interesting audience including journalists, but also lawyers and consultants.<p>www.CSVExplorer.com
farnsworthalmost 8 years ago
Side note, I googled &quot;pandas&quot; and get a lot of results related to the python library, and very few related to the large mammal. Bing doesn&#x27;t give me any related to the python library. Google knows me too well.
koolhead17almost 8 years ago
Excellent share.