Pandas 1.0

510 pointsby kylebarronover 5 years ago

17 comments

closedover 5 years ago

I've had to dive into the pandas code over the last year for a project [0], and my attitude has shifted dramatically from...<pre><code> * old attitude: why does pandas have to make things so hard * new attitude: pandas has a crazy difficult job </code></pre> I think this is most apparent in the functions that decide what "[d]type" a Block--the most basic thing that stores data in pandas--should be.<a href="https://github.com/pandas-dev/pandas/blob/4edcc5541ff3f6470f5e3c083cb83136119e6f0c/pandas/core/internals/blocks.py#L2973" rel="nofollow">https://github.com/pandas-dev/pandas/blob/4edcc5541ff3f6470f...</a>And then, for the ubiquitous Object dtype, often figure out which of the many possible more specific types to cast it to.If you think that is easy, ask yourself what this outputs:<pre><code> import numpy as np np.array([np.nan, 'a']) </code></pre> Lo and behold--it produces an array where the np.nan has been converted to the string "nan".And yet<pre><code> import pandas as pd pd.Series([np.nan, "a"]) </code></pre> Knows this, has your back, and does not stringify it.It also has a pathological fixation on when it tries to convert dtypes, since avoiding all the bad conversion outcomes is a relatively time intensive process (compared to e.g. creating a numpy array).I realize things could be much easier in pandas user facing interface, but really appreciate the sheer amount of effort that has gone into its dtype wrangling.[0]: <a href="http://github.com/machow/siuba" rel="nofollow">http://github.com/machow/siuba</a>

评论 #22188663 未加载

评论 #22188049 未加载

评论 #22191151 未加载

评论 #22188232 未加载

评论 #22191852 未加载

评论 #22187627 未加载

评论 #22188075 未加载

_coveredInBeesover 5 years ago

Great accomplishment and kudos to the dedicated maintainers. That being said, I've always had a love-hate relationship with pandas. It is a very powerful library and does a ton, but yet the API is all over the place and unless you use it regularly for a long period of time, it is almost impossible to get fluent with it. Every time I am away from it for a couple of months, I find even doing the most basic things to be complicated/confusing and find myself on stackoverflow way too often.By comparison, the API of something like Pytorch is an absolute pleasure to use and even though I'm not using it all the time, I almost have no trouble every time I begin training models/trying out new things in Pytorch.All that being said, this is definitely a step in the right direction and hopefully the API gets a bit more coherent over time.

评论 #22188413 未加载

评论 #22188601 未加载

评论 #22188735 未加载

评论 #22188316 未加载

评论 #22187597 未加载

评论 #22188639 未加载

评论 #22189913 未加载

评论 #22190378 未加载

kmax12over 5 years ago

I know I'm not the only one, but it's hard to imagine doing my job the last several year without Pandas. Even though Pandas has been used in production by many people as basically a 1.0.0 release for a long time, this an amazing milestone and I think everyone in my office smiled when they saw the release news.I think it's worth it to acknowledge the great stewardship of the community by all the Pandas developers (and the rest of people in the PyData ecosystem). It has been an inspiration for me as I create and contribute to open source libraries for data science [0][1].[0] <a href="https://github.com/FeatureLabs/featuretools/" rel="nofollow">https://github.com/FeatureLabs/featuretools/</a> [1] <a href="https://github.com/FeatureLabs/compose/" rel="nofollow">https://github.com/FeatureLabs/compose/</a>

评论 #22187593 未加载

jzwinckover 5 years ago

I am looking forward to a decade of fewer API breaking changes. However, 1.0 introduces a new column type for strings, recommends its use over the old "object" column type, yet says it is "Experimental and may change at any time."How are we supposed to interpret this in light of the promise that there will be no more API breakages until 2.0? It reads as if this promise does not apply to string data, which impacts rather a lot of use cases.

评论 #22187446 未加载

ppodover 5 years ago

Could we collect some recommendations for really good books, online guides, tutorials, and recipes for current Pandas?There are quite a few complaints here about the interface being confusing and difficult to use, and I feel like some of this is due to there being significant differences between versions. I would love to read a medium-length online free tutorial on Pandas 1.0, but it seems like most of what turns on up google are short idiosyncratic tutorials on specific tasks in various versions.

评论 #22192227 未加载

alpineidyll3over 5 years ago

Pandas is my least favorite necessary evil. It's always changing, far too expansive API costs me about an hour a week.Whenever one can use a utc epoch column for time indexed data in a raw numpy array instead, one should.

iandinwoodieover 5 years ago

"We’ve added to_markdown() for creating a markdown table"That's awesome!

snicker7over 5 years ago

No mention of vaex?<a href="https://vaex.readthedocs.io/en/latest/" rel="nofollow">https://vaex.readthedocs.io/en/latest/</a>It has a cleaner, leaner API + the ability to use memory-mapped files.

drejover 5 years ago

I've been waiting for this release for years and I hoped for one thing and one thing only - for pandas to have a proper way of dealing with NULLs. And it does have it... OPTIONALLY.It's great that the whole thing with extension arrays, custom types etc. has lead to this, but when the devs have, after 10+ years, the biggest chance for a backward incompatible change, this is the one to make. By making it optional, they are fixing it for the very few that know of its existence.I love pandas and a sizeable part of my career depended on it - and while I don't use it anymore (partly because of the NULLs), I wish it the best and I hope there will be a future release with this breaking change.

mrfusionover 5 years ago

Assuming you want to use python. What’s the alternative to pandas? Kind of a brew up your own code kind of thing? Csv module I guess?

评论 #22189289 未加载

评论 #22188430 未加载

dzongaover 5 years ago

one of the best python | data libraries out there. For analysis alone, no tool comes close.

natalyarostovaover 5 years ago

This is a huge accomplishment. The maintainers work very hard at keeping data science running for a substantial chunk of the field. Congratulations!

lapnitnelavover 5 years ago

Congratulations to the Pandas team. You lot have saved my bacon so many times over the last few years, I owe you many breakfasts.Long live the King.

mmahemoffover 5 years ago

"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."If anyone else is wondering what this is. (Source: project homepage

评论 #22191385 未加载

louis8799over 5 years ago

I have 30K line of codes in production using pandas 0.23.4. Should I consider updating.

评论 #22188916 未加载

this_is_not_youover 5 years ago

Any word for when the RC will be "properly" released?

squaresmileover 5 years ago

convert_dtypes is pretty nice. I wonder how soon the new dtypes will be the defaults.