Type-safe, high performance, distributed Neural networks in Scala (not Python, finally...).
Low level (linear algebra) operations powered by low level TensorFlow API (C, C++ bindings via JNI).
Scala used to build computation graphs and compile them into native tensor graphs.
Compiled graphs are fully calculated in native code (on CPU, GPU or TPU)
and only result is returned back via DirectBuffer
which points into native memory.
DirectBuffer
is wrapped with Tensor
read-only object which allows
to slice and read data in a convenient way (just like Breeze
or Numpy
does).
The optimizer is built on top of Spark
and can optimize the model in a distributed/parallel way.
The chosen algorithm - Data parallelism with synchronous model averaging
. The dataset is split between
the workers and each epoch is run independently on each data split, at the end of each epoch
parameters are averaged and broadcasted back to each worker.
The input data is expected to be Dataset[Array[TensorType]
and it contains a shape of the tensors in metadata.
Usually, TensorType
is choosen to be Float
since it performs best on GPU, also Double
can be used.
Example of a simple MNIST dataset classifier with Fully Connected Neural Network:
val (trainingDs, testDs) = MNIST.load(sc, trainingSize = 30000)
val model = Dense(50, Sigmoid) >> Dense(10, Softmax)
val trained = trainingDs.train(model)
.loss(CategoricalCrossentropy)
.using(Adam(0.01f))
.batch(1000)
.each(1.epochs, RecordLoss(tensorboard = true))
.each(10.epochs, RecordAccuracy(testDs, tensorboard = true))
.stopAfter(200.epochs)
.run()
accuracy(trained, testDs) should be >= 0.95f
Here, loss
and accuracy
will be logged and added to TensorBoard
as live trends. To run tensorboard execute:
tensorboard --logdir board
Same but with CNN (Convolutional Neural Network)
val (trainingDs, testDs) = MNIST()
val model =
Conv2D(32, activation = ReLU()) >> Pool2D() >>
Conv2D(64, activation = ReLU()) >> Pool2D() >>
Flatten >> Dense(10, Softmax)
val trained = trainingDs
.train(model)
.loss(CategoricalCrossentropy)
.using(Adam(0.001f))
.batch(100)
.initWith(shape => Tensor.rand(shape, range = Some(-0.1f, 0.1f)))
.each(1.epochs, RecordLoss())
.each(1.epochs, RecordAccuracy(testDs))
.stopAfter(3.epochs)
.run()
accuracy(trained, testDs) should be >= 0.98f
- Tensor
- DSL for computation DAG
- TF Session
- Core ops
- Math ops
- Logical ops
- String ops
- TF Functions, Placeholders, Session caching
- Tensor Board basic support
- Spark
- Hyper parameter tuning
- Model Import/Export
- SGD
- AdaGrad
- AdaDelta
- RMSProp
- Adam
- Nadam
- Adamax
- AMSGrad
- Variance/STD
- Covariance/Correlation Matrix
- Lots of other useful algs to analize the data set
- Linear Regression
- Simple math models for benchmarks
- Binary Logistic Regression
- ANN (Multilayer Perceptron NN)
- kernel regularization
- Layers Dropout, Batch Normalization
- Convolutional NN
- Recurrent NN
- others
- Sigmoid
- Tanh
- RELU
- Softmax
- Exp
- SELU
- ELU
- Sofplus
- RMSE (Mean Squared Error)
- Binary Crossentropy
- Categorical Crossentropy
- MNIST
- Feature scalers
- Feature embedding
- Hashed features
- Crossed features
- r2 score
- accuracy estimator,
- confusion matrix, precision, recall, f1 score
- runtime estimating and new stop condition based on that
- Create computation intensive operation, like
matmul
multiple times large tensors and compare with Scalabreeze
, pythontensorflow
, pythonnumpy
- Compare with existing implementations using local CPU
- Compare with existing implementations using one GPU
- Compare with existing implementations using distributed mode on GCP DataProc
- While training analyze the weights histograms to make sure the deep NN do not saturate
- Grid/Random hyper parameters search
- Different weight initializers (Xavier)
- Decay learning rate over time (step, exponential, 1/t decay)
- Try using in interactive notebook
- Add graph library so we could plot some charts and publish them in
tensorboard
ornotebook
(maybe fork and upgradevegas
to scala2.12
ot tryevil-plot
)
- Refactor type class hierarchy so
TensorType
was on top andNumeric
and the rest would extend it. - Refactor tensor functions so the materialized type of args was only infered during compilation
Also we would need to try simplifying tensor functions and add methods so we could compose functions (
compose
,endThen
, etc) - Add DSL to build tensor requirements like
tensor require rank(4)
,tensor require shape squratedMatrix
If you want to become a contributor, you are welcome!!! You can pick anything from a Road Map or propose your idea.
Please, contact: