TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Polars: Fast DataFrame library for Rust and Python

238 点作者 daureg超过 3 年前

20 条评论

civilized超过 3 年前
In my world, anything that isn&#x27;t &quot;identical to R&#x27;s dplyr API but faster&quot; just isn&#x27;t quite worth switching for. There&#x27;s absolutely no contest: dplyr has the most productive API and that matters to me more than anything else. But I&#x27;m glad to see Polars moves away from the kludgey sprawl of the Pandas API towards the perfection of dplyr... while also being blazingly fast!<p>Now just mix in a bit of DSL so people aren&#x27;t obligated* to write lame boilerplate like &quot;pandas.blahblah&quot; or &quot;polars.blahblah&quot; just to reference a freaking column, and you&#x27;re there!<p>*If you like the boilerplate for &quot;production robustness&quot; or whatever, go wild, but analysts and scientists benefit from the <i>option</i> to write more concisely.
评论 #29587081 未加载
评论 #29586182 未加载
评论 #29586301 未加载
评论 #29585987 未加载
评论 #29587700 未加载
评论 #29591640 未加载
评论 #29595474 未加载
评论 #29590113 未加载
评论 #29591699 未加载
评论 #29586230 未加载
gpderetta超过 3 年前
From the python docs:<p><pre><code> &gt; No Index &gt; They are not needed. Not having them makes things easier. Convince me otherwise </code></pre> Agree completely. first class indices in pandas just complicate everything by having a specially blessed column that can&#x27;t be manipulated consistently. Secondary indices should be &quot;just&quot; an optimization, while primary indices are a constraint on the whole table (not a single column).<p>The library in general seem interesting. I&#x27;m not 100% sold on the syntax (as usual project is called select...), but it is not pandas which is already a huge plus.
评论 #29591891 未加载
sriku超过 3 年前
Hmmm .. in the linked benchmarks [1], DataFrames.jl (Julia library) appears to be fairly competitive.<p>[1] <a href="https:&#x2F;&#x2F;h2oai.github.io&#x2F;db-benchmark&#x2F;" rel="nofollow">https:&#x2F;&#x2F;h2oai.github.io&#x2F;db-benchmark&#x2F;</a>
abeppu超过 3 年前
There are so many dataframe libraries, many of which have APIs closely following pandas, but not drop-in replacements. I wish we could agree on a standard describing the core parts of what a dataframe must do, such that code depending only on those operations can easily move between dataframes.
评论 #29586405 未加载
评论 #29585723 未加载
评论 #29585473 未加载
评论 #29586014 未加载
评论 #29598251 未加载
评论 #29594038 未加载
评论 #29586668 未加载
评论 #29590131 未加载
vincent-toups超过 3 年前
God please anything to liberate me from pandas, which has one of the wildest API&#x27;s I&#x27;ve ever had to routinely work with.
Dowwie超过 3 年前
Polars could bring the best of both worlds together if it can codegen python api calls to their Rust equivalent. A user conducts ad-hoc analysis and rapid development with Python. When the work is ready to ship, the user invokes a codegen to transform into Rust-equivalent api calls, resulting in a new rust module.
ahurmazda超过 3 年前
I’ve been using it for the past quarter. In addition to the speed, I’m very pleased with the pyspark-esque api. This means migrating code from research to production is that much easier.
riskneutral超过 3 年前
I&#x27;m confused. Polars is built on top of the Rust of bindings for Apache Arrow. Arrow already has Python bindings. What does this project add by creating a new Python binding on top of the Rust binding?
评论 #29587987 未加载
Fiahil超过 3 年前
… and it’s using arrow2, not the official, unsafe, arrow crate. Great, it means we can use it !
optimalonpaper超过 3 年前
I&#x27;m reading all these comments and keep asking myself if I&#x27;m missing something, because I honestly sort of like pandas&#x27; API?<p>Sure dplyr is nice -- it felt that way on rare occasions that I got to use it, at least -- but you get used to anything.<p>So since I&#x27;m using python and know it quite well, I&#x27;m just more comfortable sticking with python&#x27;s pandas framework rather than switching to R for data processing
jmakov超过 3 年前
How does compare to Vaex?
评论 #29586977 未加载
评论 #29585417 未加载
unixhero超过 3 年前
What makes Pandas so bad and what makes Dplyr so great?<p>I have used Pandas a lot for data analysis and for data integration duct tape scenarios. For me it has been a low bar for achieving a lot.
评论 #29591738 未加载
评论 #29591565 未加载
评论 #29589753 未加载
评论 #29588962 未加载
the_biot超过 3 年前
I&#x27;ve never seen the term &quot;dataframe&quot; used as it is on this webste, and the commenters here seem to all use it. Judging by the examples it seems to just refer to a &quot;row&quot; from e.g. a CSV or SQL query. So is that all it is, or am I missing something?
评论 #29591512 未加载
评论 #29590456 未加载
评论 #29590872 未加载
rytill超过 3 年前
How would this compare to loading a sqlite database into memory and performing queries with it?
评论 #29588063 未加载
pvitz超过 3 年前
Does anybody here know dataframe systems that are able to handle file sizes bigger than the available RAM? Is polars able to handle this? I am only aware of disk.frame (diskframe.com), but don&#x27;t know how well it performs.
评论 #29589600 未加载
评论 #29589974 未加载
评论 #29589143 未加载
评论 #29591107 未加载
评论 #29591458 未加载
评论 #29590971 未加载
thenipper超过 3 年前
We&#x27;ve been thinking about trying this out at work for some of our data pipelines&#x2F;simplified models. The speed&#x2F;ergonomics look great.
ZeroGravitas超过 3 年前
Is there a plugin to use this as a visidata backend? I quite like their UX.
xiaodai超过 3 年前
It&#x27;s great to see innovation in this area.
评论 #29586214 未加载
callmerk超过 3 年前
.
nas超过 3 年前
It looks interesting but phrases like &quot;embarrassingly parallel execution&quot; make my marketing hype detectors trigger. Maybe they could tone down their self promotion just a touch. Also &quot;Even though Polars is completely written in Rust (no runtime overhead!) ...&quot;. I find that hard to believe.
评论 #29585647 未加载
评论 #29585250 未加载
评论 #29589475 未加载
评论 #29585544 未加载