Innnnteresting! We've been using Pandas as our slow CPU fallback in the GPU Arrow world b/c of issues like this.<p>Today, we default to using pydata gpu arrow tools like BlazingSQL or Nvidia RAPIDS directly. They ~guarantee perf, and subtle yet critical for maintaining our < 100ms SLA, the Arrow format stays clean. (Ex: don't want a column schema to get coerced to something less structured.) We'll use Pandas as a fallback for when they lack features or are hard to use.<p>The ideal would be to use Pandas directly. Today it is a bit of a crapshoot on whether schemas will break across calls, and the above libraries are really replacements, rather than integrated accelerator extensions. So thinking like this project get us closer to predictable (and GPU-level) performance within pandas, vs fully replacing it. So cool!
I can't say this is going to make a big difference in how I use pandas but I've ran into the bizarre "can't have nans in an int Series" annoyance in almost every pandas project I've worked on, so good on them for fixing that.