TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Lesser-known Pandas tricks (2019)

159 pointsby HIP_HOPabout 5 years ago

7 comments

jpxwabout 5 years ago
Something I love about pandas is that often you can pass a URL in place of a file name.<p>The other day I needed to scrape data from a table on a webpage. Thinking about traversing the DOM and building up an array was already giving me a headache. Thankfully pandas has the “read_html” function. Getting a list of dataframes for each table on the page was as easy as:<p><pre><code> dfs = pd.read_html(url)</code></pre>
评论 #22544896 未加载
评论 #22546376 未加载
评论 #22544916 未加载
aksakalliabout 5 years ago
Medium wants me to upgrade my account to read this article, please people share your posts in somewhere else.
评论 #22544946 未加载
评论 #22544961 未加载
评论 #22544912 未加载
andreareinaabout 5 years ago
Merge with indicator is also useful for doing anti-joins:<p><pre><code> left.merge(right, how=&quot;left&quot;, indicator=True, ...) [lambda df: df._merge == &quot;left_only&quot;]</code></pre>
staticautomaticabout 5 years ago
My favorite, most elegant SO answer I&#x27;ve ever gotten was to a question about Pandas.<p>The question was &quot;How do I create a column where each row&#x27;s value is the mean of another column&#x27;s values starting at that row?&quot; The answer was:<p><pre><code> df.loc[::-1, &#x27;col_1&#x27;].expanding().mean()[::-1]</code></pre>
评论 #22550193 未加载
评论 #22552058 未加载
closedabout 5 years ago
Note that there is a handy PeriodIndex version of pd.date_range:<p><pre><code> pd.period_range(date_from, date_to, freq = &quot;D&quot;) </code></pre> AFAICT, a PeriodIndex and DateTimeIndex function mostly the same, and have many of the same methods, except...<p><pre><code> * DateTimeIndex can&#x27;t hold dates far in the future * PeriodIndex can&#x27;t easily round to the end of a period (e.g. date + 0*MonthEnd() errors) * PeriodIndex doesn&#x27;t handle timezones?</code></pre>
HIP_HOPabout 5 years ago
TLDR;<p>5 lesser-known pandas tricks:<p>1. Date Ranges<p>2. Merge with indicator<p>3. Nearest merge by timestamp<p>4. Create an Excel report from pandas<p>5. Use gzip with when saving to csv
collywabout 5 years ago
Does anyone want to do a TLDR? I don&#x27;t especially want to sign into Medium.
评论 #22544910 未加载