TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Pandas extension arrays

95 pointsby ptypeover 6 years ago

3 comments

lmeyerovover 6 years ago
Innnnteresting! We&#x27;ve been using Pandas as our slow CPU fallback in the GPU Arrow world b&#x2F;c of issues like this.<p>Today, we default to using pydata gpu arrow tools like BlazingSQL or Nvidia RAPIDS directly. They ~guarantee perf, and subtle yet critical for maintaining our &lt; 100ms SLA, the Arrow format stays clean. (Ex: don&#x27;t want a column schema to get coerced to something less structured.) We&#x27;ll use Pandas as a fallback for when they lack features or are hard to use.<p>The ideal would be to use Pandas directly. Today it is a bit of a crapshoot on whether schemas will break across calls, and the above libraries are really replacements, rather than integrated accelerator extensions. So thinking like this project get us closer to predictable (and GPU-level) performance within pandas, vs fully replacing it. So cool!
评论 #19051251 未加载
mactreyover 6 years ago
I can&#x27;t say this is going to make a big difference in how I use pandas but I&#x27;ve ran into the bizarre &quot;can&#x27;t have nans in an int Series&quot; annoyance in almost every pandas project I&#x27;ve worked on, so good on them for fixing that.
评论 #19052675 未加载
评论 #19053842 未加载
bpchapsover 6 years ago
Has anyone done any perf analysis between this and previous versions?
评论 #19053495 未加载