Skip to content

Commit

Permalink
Revise readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Cheng-Lin-Li committed May 19, 2017
1 parent 4c23484 commit f6739b0
Show file tree
Hide file tree
Showing 3 changed files with 68 additions and 1 deletion.
18 changes: 18 additions & 0 deletions .project
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>Spark</name>
<comment></comment>
<projects>
<project>DataMining</project>
</projects>
<buildSpec>
<buildCommand>
<name>org.python.pydev.PyDevBuilder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>org.python.pydev.pythonNature</nature>
</natures>
</projectDescription>
8 changes: 8 additions & 0 deletions .pydevproject
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?eclipse-pydev version="1.0"?><pydev_project>
<pydev_pathproperty name="org.python.pydev.PROJECT_SOURCE_PATH">
<path>/${PROJECT_DIR_NAME}</path>
</pydev_pathproperty>
<pydev_property name="org.python.pydev.PYTHON_PROJECT_VERSION">python 2.7</pydev_property>
<pydev_property name="org.python.pydev.PYTHON_PROJECT_INTERPRETER">python 2.7.13</pydev_property>
</pydev_project>
43 changes: 42 additions & 1 deletion ALS/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,42 @@
#Spark
## This is an implementation of Alternating Least Squares (ALS) algorithm in Spark with Python 2.7

## Algorithm: Alternating Least Squares (ALS) Algorithm

## Task:
The task is to modify the parallel implementation of ALS (alternating least squares) algorithm in Spark, so that it takes a utility matrix as the input, and output the root-mean-square deviation (RMSE) into standard output or a file after each iteration. The code for the algorithm is als.py under the <spark-2.1.0 installation directory>/examples/src/main/python.

#### Usage: bin/spark-submit ALS.py input-matrix n m f k p [output-file]
1. n is the number of rows (users) of the matrix


2. m is the number of columns (products). 


3. f is the number of dimensions/factors in the factor model. That is, U is n-by-f matrix, while V is f-by-m matrix.


4. k is the number of iterations.


5. p, which is the number of partitions for the input-matrix

6. output-file, which is the path to the output file. This parameter is optional.


#### Input: Take a utility matrix (mat.dat) as the input

#### Output: Output root-mean-square deviation (RMSE) into standard output or a file after each iteration
After each iteration, output RMSE with 4 floating points.
The "%.4f" % RMSE is adapted to format the RMSE value, and save into file as follows.

1.0019


0.9794


0.8464



0 comments on commit f6739b0

Please sign in to comment.