|
| 1 | +Deprecation |
| 2 | +=========== |
| 3 | + |
| 4 | +This project is deprecated. |
| 5 | +We now recommend using scikit-learn and `Joblib Apache Spark Backend <https://github.com/joblib/joblib-spark>`_ |
| 6 | +to distribute scikit-learn hyperparameter tuning tasks on a Spark cluster: |
| 7 | + |
| 8 | +You need ``pyspark>=2.4.4`` and ``scikit-learn>=0.21`` to use Joblib Apache Spark Backend, which can be installed using ``pip``: |
| 9 | + |
| 10 | +.. code:: bash |
| 11 | +
|
| 12 | + pip install joblibspark |
| 13 | +
|
| 14 | +The following example shows how to distributed ``GridSearchCV`` on a Spark cluster using ``joblibspark``. |
| 15 | +Same applies to ``RandomizedSearchCV``. |
| 16 | + |
| 17 | +.. code:: python |
| 18 | +
|
| 19 | + from sklearn import svm, datasets |
| 20 | + from sklearn.model_selection import GridSearchCV |
| 21 | + from joblibspark import register_spark |
| 22 | + from sklearn.utils import parallel_backend |
| 23 | +
|
| 24 | + register_spark() # register spark backend |
| 25 | +
|
| 26 | + iris = datasets.load_iris() |
| 27 | + parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} |
| 28 | + svr = svm.SVC(gamma='auto') |
| 29 | +
|
| 30 | + clf = GridSearchCV(svr, parameters, cv=5) |
| 31 | +
|
| 32 | + with parallel_backend('spark', n_jobs=3): |
| 33 | + clf.fit(iris.data, iris.target) |
| 34 | +
|
| 35 | +
|
1 | 36 | Scikit-learn integration package for Apache Spark
|
2 | 37 | =================================================
|
3 | 38 |
|
@@ -71,6 +106,7 @@ on how to install the package.
|
71 | 106 |
|
72 | 107 | This classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.
|
73 | 108 |
|
| 109 | + |
74 | 110 | Documentation
|
75 | 111 | -------------
|
76 | 112 |
|
|
0 commit comments