Something I love about pandas is that often you can pass a URL in place of a file name.<p>The other day I needed to scrape data from a table on a webpage. Thinking about traversing the DOM and building up an array was already giving me a headache. Thankfully pandas has the “read_html” function. Getting a list of dataframes for each table on the page was as easy as:<p><pre><code> dfs = pd.read_html(url)</code></pre>
Merge with indicator is also useful for doing anti-joins:<p><pre><code> left.merge(right, how="left", indicator=True, ...)
[lambda df: df._merge == "left_only"]</code></pre>
My favorite, most elegant SO answer I've ever gotten was to a question about Pandas.<p>The question was "How do I create a column where each row's value is the mean of another column's values starting at that row?" The answer was:<p><pre><code> df.loc[::-1, 'col_1'].expanding().mean()[::-1]</code></pre>
Note that there is a handy PeriodIndex version of pd.date_range:<p><pre><code> pd.period_range(date_from, date_to, freq = "D")
</code></pre>
AFAICT, a PeriodIndex and DateTimeIndex function mostly the same, and have many of the same methods, except...<p><pre><code> * DateTimeIndex can't hold dates far in the future
* PeriodIndex can't easily round to the end of a period (e.g. date + 0*MonthEnd() errors)
* PeriodIndex doesn't handle timezones?</code></pre>
TLDR;<p>5 lesser-known pandas tricks:<p>1. Date Ranges<p>2. Merge with indicator<p>3. Nearest merge by timestamp<p>4. Create an Excel report from pandas<p>5. Use gzip with when saving to csv