Intel Extension for Scikit-Learn

183 pointsby privongover 3 years ago

13 comments

Accelerating scikit-learn is a smart move. At the algorithmic level for every ML use case there is probably x 10 non-ML data science projects. Also, it is good to have a true community framework that does not depend on the success of the metaverse for funding ;-)The lock-in is an important consideration, but if the scikit-learn API is fully respected it would seem less relevant. It also suggests a pattern for how other hardware vendors could accelerate scikit-learn as a genuine contribution?

评论 #29069468 未加载

jjerphanover 3 years ago

Hi all,Currently some work is being done to improve computational primitives of scikit-learn to enhance its overhaul performances natively.You can have a look at this exploratory PR: <a href="https://github.com/scikit-learn/scikit-learn/pull/20254" rel="nofollow">https://github.com/scikit-learn/scikit-learn/pull/20254</a>This other PR is a clear revamp of this previous one: <a href="https://github.com/scikit-learn/scikit-learn/pull/21462" rel="nofollow">https://github.com/scikit-learn/scikit-learn/pull/21462</a>Cheers, Julien.

kvathupoover 3 years ago

Intel seems 6 years too late to the party CUDA started. That said, it could pick up traction: academics have increasingly been using pytorch.EDIT: Perhaps its my inexperience, but is anyone else confused by the OneAPI rollout? There isn't exactly backwards compatiblity with the Classic Intel compiler, and an embarassing amount of time elapsed until I realized "Data Parallel C++" doesn't refer to parallel programming in C++, but rather an Intel-developed API built atop C++.

评论 #29068323 未加载

评论 #29067894 未加载

评论 #29070068 未加载

评论 #29068012 未加载

评论 #29068059 未加载

freediverover 3 years ago

Just tried the patch in Google Colab and results for the example code were actually about 20% slower than without the patch.<a href="https://imgur.com/a/7EmlYJy" rel="nofollow">https://imgur.com/a/7EmlYJy</a>What am I missing?edit: it seems my instance was using AMD EPYC.

评论 #29199324 未加载

evanbover 3 years ago

The syntax and usability of<pre><code> from sklearnex import patch_sklearn # The names match scikit-learn estimators patch_sklearn("SVC") </code></pre> seems quite clunky. I'd have preferred a syntax like<pre><code> from sklearnex import SVC </code></pre> Then, maintenance would be substantially easier. If sklearnex had import-level compatibility with sklearn it'd be as simple as some simple replacements,<pre><code> import sklearn --> import sklearnex as sklearn from sklearn.cluster import KMeans --> from sklearnex.cluster import KMeans </code></pre> which seems much easier / clearer.

评论 #29068764 未加载

评论 #29068448 未加载

评论 #29199401 未加载

jeffbeeover 3 years ago

A 5000x boost in KNN inference is not bad.Generally speaking the distribution-packaged versions of python and all its scientific libraries and their support libraries are best ignored. That stuff should always be rebuilt to suit your actual production hardware, instead of a 2007-era Opteron.

评论 #29067994 未加载

评论 #29071143 未加载

savant_penguinover 3 years ago

As cool as this is, why would you lock yourself into Intel?Especially with cloud providers making arm processors available at lower prices.At the same time: "Intel® Extension for Scikit-learn* is a free software AI accelerator that brings over 10-100X acceleration across a variety of applications."Maybe their free software could be extended to all processors?

评论 #29068408 未加载

评论 #29068232 未加载

评论 #29068601 未加载

评论 #29074217 未加载

评论 #29068388 未加载

klelattiover 3 years ago

Is this Intel only or oneAPI which is supposed to be cross platform. Not entirely clear which makes me suspicious.

klelattiover 3 years ago

> oneAPI Data Analytics Library (oneDAL) is a powerful machine learning library that helps speed up big data analysis. oneDAL solvers are also used in Intel Distribution for Python for scikit-learn optimization.> oneDAL is part of oneAPI.So oneAPI is cross industry but this only works with Intel CPUs?Hmm. Not sure I’m buying this Intel. Sounds like you’re claiming to be open but locking people into Intel only libraries.

westurnerover 3 years ago

<a href="https://github.com/intel/scikit-learn-intelex" rel="nofollow">https://github.com/intel/scikit-learn-intelex</a>CuML is similar to Intel Extension for Scikit-Learn in function? <a href="https://github.com/rapidsai/cuml" rel="nofollow">https://github.com/rapidsai/cuml</a>> cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.

joshlkover 3 years ago

What changes have been made to get the speedups?

评论 #29199543 未加载

syntaxingover 3 years ago

Is there a specific “test” to run as a performance standard for scikit? I noticed this the other day that my Mac mini M1 absolutely blows away my MacBook Air 2020 with an i7. I was always curious if there was a good way to gauge performance.

zibzabover 3 years ago

Any idea if this can be used with Jupyter?I have a bunch of notebooks that take 4-8 hours to run. This could potentially make my life much easier.

评论 #29199566 未加载

评论 #29072184 未加载

评论 #29071989 未加载

13 comments

streamofdigitsover 3 years ago

评论 #29069468 未加载

jjerphanover 3 years ago

kvathupoover 3 years ago

评论 #29068323 未加载

评论 #29067894 未加载

评论 #29070068 未加载

评论 #29068012 未加载

评论 #29068059 未加载

freediverover 3 years ago

评论 #29199324 未加载

evanbover 3 years ago

评论 #29068764 未加载

评论 #29068448 未加载

评论 #29199401 未加载

jeffbeeover 3 years ago

评论 #29067994 未加载

评论 #29071143 未加载

savant_penguinover 3 years ago

评论 #29068408 未加载

评论 #29068232 未加载

评论 #29068601 未加载

评论 #29074217 未加载

评论 #29068388 未加载

klelattiover 3 years ago

Is this Intel only or oneAPI which is supposed to be cross platform. Not entirely clear which makes me suspicious.

klelattiover 3 years ago

westurnerover 3 years ago

joshlkover 3 years ago

What changes have been made to get the speedups?

评论 #29199543 未加载

syntaxingover 3 years ago

zibzabover 3 years ago

Any idea if this can be used with Jupyter?I have a bunch of notebooks that take 4-8 hours to run. This could potentially make my life much easier.

评论 #29199566 未加载

评论 #29072184 未加载

评论 #29071989 未加载