I've had to dive into the pandas code over the last year for a project [0], and my attitude has shifted dramatically from...<p><pre><code> * old attitude: why does pandas have to make things so hard
* new attitude: pandas has a crazy difficult job
</code></pre>
I think this is most apparent in the functions that decide what "[d]type" a Block--the most basic thing that stores data in pandas--should be.<p><a href="https://github.com/pandas-dev/pandas/blob/4edcc5541ff3f6470f5e3c083cb83136119e6f0c/pandas/core/internals/blocks.py#L2973" rel="nofollow">https://github.com/pandas-dev/pandas/blob/4edcc5541ff3f6470f...</a><p>And then, for the ubiquitous Object dtype, often figure out which of the many possible more specific types to cast it to.<p>If you think that is easy, ask yourself what this outputs:<p><pre><code> import numpy as np
np.array([np.nan, 'a'])
</code></pre>
Lo and behold--it produces an array where the np.nan has been converted to the string "nan".<p>And yet<p><pre><code> import pandas as pd
pd.Series([np.nan, "a"])
</code></pre>
Knows this, has your back, and does not stringify it.<p>It also has a pathological fixation on <i>when</i> it tries to convert dtypes, since avoiding all the bad conversion outcomes is a relatively time intensive process (compared to e.g. creating a numpy array).<p>I realize things could be much easier in pandas user facing interface, but really appreciate the sheer amount of effort that has gone into its dtype wrangling.<p>[0]: <a href="http://github.com/machow/siuba" rel="nofollow">http://github.com/machow/siuba</a>