layout | title | displayTitle |
---|---|---|
global |
PMML model export - MLlib |
<a href="mllib-guide.html">MLlib</a> - PMML model export |
- Table of contents {:toc}
MLlib supports model export to Predictive Model Markup Language (PMML).
The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML model.
MLlib model | PMML model |
---|---|
KMeansModel | ClusteringModel |
LinearRegressionModel | RegressionModel (functionName="regression") |
RidgeRegressionModel | RegressionModel (functionName="regression") |
LassoModel | RegressionModel (functionName="regression") |
SVMModel | RegressionModel (functionName="classification" normalizationMethod="none") |
Binary LogisticRegressionModel | RegressionModel (functionName="classification" normalizationMethod="logit") |
Here a complete example of building a KMeansModel and print it out in PMML format: {% highlight scala %} import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data val data = sc.textFile("data/mllib/kmeans_data.txt") val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
// Cluster the data into two classes using KMeans val numClusters = 2 val numIterations = 20 val clusters = KMeans.train(parsedData, numClusters, numIterations)
// Export to PMML println("PMML Model:\n" + clusters.toPMML) {% endhighlight %}
As well as exporting the PMML model to a String (model.toPMML
as in the example above), you can export the PMML model to other formats:
{% highlight scala %} // Export the model to a String in PMML format clusters.toPMML
// Export the model to a local file in PMML format clusters.toPMML("/tmp/kmeans.xml")
// Export the model to a directory on a distributed file system in PMML format clusters.toPMML(sc,"/tmp/kmeans")
// Export the model to the OutputStream in PMML format clusters.toPMML(System.out) {% endhighlight %}
For unsupported models, either you will not find a .toPMML
method or an IllegalArgumentException
will be thrown.