Skip to content
This repository was archived by the owner on Dec 4, 2019. It is now read-only.

Commit 201c4e6

Browse files
WeichenXu123mengxr
authored andcommitted
[ML-9015] Deprecate spark-sklearn repo (#115)
1 parent cbde36f commit 201c4e6

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

Diff for: README.rst

+36
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,38 @@
1+
Deprecation
2+
===========
3+
4+
This project is deprecated.
5+
We now recommend using scikit-learn and `Joblib Apache Spark Backend <https://github.com/joblib/joblib-spark>`_
6+
to distribute scikit-learn hyperparameter tuning tasks on a Spark cluster:
7+
8+
You need ``pyspark>=2.4.4`` and ``scikit-learn>=0.21`` to use Joblib Apache Spark Backend, which can be installed using ``pip``:
9+
10+
.. code:: bash
11+
12+
pip install joblibspark
13+
14+
The following example shows how to distributed ``GridSearchCV`` on a Spark cluster using ``joblibspark``.
15+
Same applies to ``RandomizedSearchCV``.
16+
17+
.. code:: python
18+
19+
from sklearn import svm, datasets
20+
from sklearn.model_selection import GridSearchCV
21+
from joblibspark import register_spark
22+
from sklearn.utils import parallel_backend
23+
24+
register_spark() # register spark backend
25+
26+
iris = datasets.load_iris()
27+
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
28+
svr = svm.SVC(gamma='auto')
29+
30+
clf = GridSearchCV(svr, parameters, cv=5)
31+
32+
with parallel_backend('spark', n_jobs=3):
33+
clf.fit(iris.data, iris.target)
34+
35+
136
Scikit-learn integration package for Apache Spark
237
=================================================
338

@@ -71,6 +106,7 @@ on how to install the package.
71106
72107
This classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.
73108

109+
74110
Documentation
75111
-------------
76112

0 commit comments

Comments
 (0)