Excellent library for train_test_split.
Jokes aside. This next to Numpy, Pandas Jupyter and Matplotlib + the DL libraries are the reason Python is the powerhouse it is for Data Science.
Early on, pandas made some unfortunate design decisions that are still biting hard. For example, the choice of datetime (pandas.Timestamp) represented by a 64-bit int with a fixed nanosecond resolution. This choice gives dynamic range of +- 292 years around 1970-01-01 (the epoch). This range is too small to represent the works of William Shakespeare, never mind human history. Using pandas in these areas becomes a royal pain in the neck, for one constantly needs to work around pandas datetime limitations.<p>OTOH, in numpy one can choose time resolution units (anything from attosecond to a year) tailoring time resolution to your task (from high energy physics all way to astronomy). Panda's choice is only good for high-frequency stock traders, though.
Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2<p><a href="https://github.com/scikit-learn/scikit-learn/releases/tag/1.0.rc2" rel="nofollow">https://github.com/scikit-learn/scikit-learn/releases/tag/1....</a>
Great that they finally added quantile regression. This was sorely missed.<p>I’m still hoping for a mixed-effects model implementation someday, like lme4 in R. The statsmodels implementation can only do predictions on fixed effects, which limits it greatly.<p>I’ve always wondered why mixed effect type models are not more popular in the ML world.
scikit-learn (next to numpy) is the one library I use in every single project at work. Every time I consider switching away from python I am faced with the fact that I'd lose access to this workhorse of a library.
Of course it's not all sunshine and rainbows - I had my fair share of rummaging through its internals - but its API design is a de-facto standard for a reason.
My only recurring gripe is that the serialization story (basically just pickling everything) is not optimal.
Scikit-Learn is great, and, reading the documentation for other 3rd party ML packages and seeing the words "Scikit-learn API" is even better.
What about sktime?
<a href="https://github.com/alan-turing-institute/sktime" rel="nofollow">https://github.com/alan-turing-institute/sktime</a>