Skip to content

Commit 528f934

Browse files
kovacsgaborbereczdaniel
authored andcommitted
Update README.md (#3)
* Update README.md after the change to allow custom parameter ids The README.md contained outdated example code which not yet included the newly added `Id` type parameter in the PS-related methods and types. * Small fixes in README.md * Change the link to the Flink DataStream API to always refer to the most recent version instead of the outdated 1.2 * Fix the library dependency example which did not work because of the automatic addition of the Scala version to artifact names
1 parent f013cd2 commit 528f934

File tree

1 file changed

+15
-13
lines changed

1 file changed

+15
-13
lines changed

README.md

+15-13
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Flink Parameter Server
22

33
A Parameter Server implementation based on the
4-
[Streaming API](https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html) of [Apache Flink](http://flink.apache.org/).
4+
[Streaming API](https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html) of [Apache Flink](http://flink.apache.org/).
55

66
Parameter Server is an abstraction for model-parallel machine learning (see the work of [Li et al.](https://doi.org/10.1145/2640087.2644155)).
77
Our implementation could be used with the Streaming API:
@@ -17,7 +17,7 @@ sbt publish-local
1717
and then added to a project as a dependency
1818

1919
```sbt
20-
libraryDependencies += "hu.sztaki.ilab" % "flink-ps" % "0.1.0"
20+
libraryDependencies += "hu.sztaki.ilab" %% "flink-ps" % "0.1.0"
2121
```
2222

2323
# API
@@ -30,35 +30,37 @@ Basically, we can access the Parameter Server by defining a [```WorkerLogic```](
3030

3131
We need to implement the ```WorkerLogic``` trait
3232
```scala
33-
trait WorkerLogic[T, P, WorkerOut] extends Serializable {
34-
def onRecv(data: T, ps: ParameterServerClient[P, WorkerOut]): Unit
35-
def onPullRecv(paramId: Int, paramValue: P, ps: ParameterServerClient[P, WorkerOut]): Unit
33+
trait WorkerLogic[T, Id, P, WOut] extends Serializable {
34+
def onRecv(data: T, ps: ParameterServerClient[Id, P, WOut]): Unit
35+
def onPullRecv(paramId: Id, paramValue: P, ps: ParameterServerClient[Id, P, WOut]): Unit
3636
}
3737
```
3838
where we can handle incoming data (`onRecv`), *pull* parameters from the Parameter Server, handle the answers to the pulls (`onPullRecv`), and *push* parameters to the Parameter Server or *output* results. We can use the ```ParameterServerClient```:
3939
```scala
40-
trait ParameterServerClient[P, WorkerOut] extends Serializable {
41-
def pull(id: Int): Unit
42-
def push(id: Int, deltaUpdate: P): Unit
43-
def output(out: WorkerOut): Unit
40+
trait ParameterServerClient[Id, P, WOut] extends Serializable {
41+
def pull(id: Id): Unit
42+
def push(id: Id, deltaUpdate: P): Unit
43+
def output(out: WOut): Unit
4444
}
4545
```
4646

4747
When we defined our worker logic we can wire it into a Flink job with the `transform` method of [```FlinkParameterServer```](src/main/scala/hu/sztaki/ilab/ps/FlinkParameterServer.scala).
4848

4949
```scala
50-
def transform[T, P, WorkerOut](
50+
def transform[T, Id, P, WOut](
5151
trainingData: DataStream[T],
52-
workerLogic: WorkerLogic[T, P, WorkerOut],
53-
paramInit: => Int => P,
52+
workerLogic: WorkerLogic[T, Id, P, WOut],
53+
paramInit: => Id => P,
5454
paramUpdate: => (P, P) => P,
5555
workerParallelism: Int,
5656
psParallelism: Int,
57-
iterationWaitTime: Long): DataStream[Either[WorkerOut, (Int, P)]]
57+
iterationWaitTime: Long): DataStream[Either[WOut, (Id, P)]]
5858
```
5959

6060
Besides the `trainingData` stream and the `workerLogic`, we need to define how the Parameter Server should initialize a parameter based on the parameter id (`paramInit`), and how to update a parameter based on a received push (`paramUpdate`). We must also define how many parallel instances of workers and parameter servers we should use (`workerParallelism` and `psParallelism`), and the `iterationWaitTime` (see [Limitations](README.md#limitations)).
6161

62+
There are also other options to define a DataStream transformation with a Parameter Server which let us specialize the process in more detail. See the different methods of [```FlinkParameterServer```](src/main/scala/hu/sztaki/ilab/ps/FlinkParameterServer.scala).
63+
6264
# Limitations
6365

6466
We implement the two-way communication of workers and the parameter server with Flink Streaming [iterations](https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#iterations), which is not yet production-ready. The main issues are

0 commit comments

Comments
 (0)