Skip to content

Commit 0204b1c

Browse files
committed
Add GPU support, refactor code, add D2, CHOCO-SGD algorithms
1 parent a151148 commit 0204b1c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1140
-634
lines changed

Diff for: .gitignore

+6-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
__pycache__
22
figs
33
data
4-
*.data
5-
*.npz
4+
build
5+
problems/MNIST
6+
*npz
7+
*pt
8+
*egg-info
9+
*data
610
.DS_Store
711
*egg-info

Diff for: README.md

+45-13
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This repository contains a set of optimization algorithms and objective function
77
2. "Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction" [[PDF](https://arxiv.org/abs/1909.05844v2)]. (code is in the previous version of this repo [[link](https://github.com/liboyue/Network-Distributed-Algorithm/tree/08abe14f2a2d5929fc401ff99961ca3bae40ff60)])
88

99
Due to the random data generation procedure,
10-
resulting graphs may be slightly different from those appeared in the paper,
10+
results may be slightly different from those appeared in papers,
1111
but conclusions remain the same.
1212

1313
If you find this code useful, please cite our papers:
@@ -32,44 +32,76 @@ If you find this code useful, please cite our papers:
3232
}
3333
```
3434

35-
## Implemented objective functions
35+
36+
## 1. Features
37+
- Easy to use: come with several popular objective functions with optional regularization and compression, essential optimization algorithms, utilities to run experiments and plot results
38+
- Extendability: easy to implement your own objective functions / optimization algorithms / datasets
39+
- Correctness: numerically verified gradient implementation
40+
- Performance: can run on both CPU and GPU
41+
- Data preprocessing: shuffling, normalizing, splitting
42+
43+
44+
## 2. Installation and usage
45+
### 2.1 Installation
46+
47+
`pip install git+https://github.com/liboyue/Network-Distributed-Algorithm.git`
48+
49+
If you have Nvidia GPUs, please also install `cupy`.
50+
51+
### 2.2 Implementing your own objective function
52+
### 2.3 Implementing your own optimizer
53+
54+
55+
## 3. Objective functions
3656
The gradient implementations of all objective functions are checked numerically.
3757

38-
### Linear regression
58+
### 3.1 Linear regression
3959
Linear regression with random generated data.
4060
The objective function is
4161
<img src="https://render.githubusercontent.com/render/math?math=f(w) = \frac{1}{N} \sum_i (y_i - x_i^\top w)^2">
4262

43-
### Logistic regression
44-
Logistic regression with $l$-2 or nonconvex regularization with random generated data or the Gisette dataset or datasets from `libsvmtools`.
63+
### 3.2 Logistic regression
64+
Logistic regression with l-2 or nonconvex regularization with random generated data or the Gisette dataset or datasets from `libsvmtools`.
4565
The objective function is
46-
<img src="https://render.githubusercontent.com/render/math?math=f(w) = - \frac{1}{N} * \Big(\sum_i y_i \log \frac{1}{1 + exp(w^T x_i)} + (1 - y_i) \log \frac{exp(w^T x_i)}{1 + exp(w^T x_i)} \Big) + \frac{\lambda}{2} \| w \|_2^2 + \alpha \sum_j \frac{w_j^2}{1 + w_j^2}2">
66+
<img src="https://render.githubusercontent.com/render/math?math=f(w) = - \frac{1}{N} * \Big(\sum_i y_i \log \frac{1}{1 %2B exp(w^T x_i)} %2B (1 - y_i) \log \frac{exp(w^T x_i)}{1 %2B exp(w^T x_i)} \Big) %2B \frac{\lambda}{2} \| w \|_2^2 %2B \alpha \sum_j \frac{w_j^2}{1 %2B w_j^2}">
4767

48-
49-
### One-hidden-layer fully-connected neural netowrk
68+
### 3.3 One-hidden-layer fully-connected neural netowrk
5069
One-hidden-layer fully-connected neural network with softmax loss on the MNIST dataset.
5170

5271

53-
## Implemented optimization algorithms
72+
## 4. Datasets
73+
- MNIST
74+
- Gisette
75+
- LibSVM data
76+
- Random generated data
77+
5478

55-
### Centralized optimization algorithms
79+
## 5. Optimization algorithms
80+
81+
### 5.1 Centralized optimization algorithms
5682
- Gradient descent
5783
- Stochastic gradient descent
5884
- Nesterov's accelerated gradient descent
5985
- SVRG
6086
- SARAH
6187

62-
### Distributed optimization algorithms (i.e. with parameter server)
88+
### 5.2 Distributed optimization algorithms (i.e. with parameter server)
6389
- ADMM
6490
- DANE
6591

66-
67-
### Decentralized optimization algorithms
92+
### 5.3 Decentralized optimization algorithms
6893
- Decentralized gradient descent
6994
- Decentralized stochastic gradient descent
7095
- Decentralized gradient descent with gradient tracking
7196
- EXTRA
7297
- NIDS
98+
- D2
99+
- CHOCO-SGD
73100
- Network-DANE/SARAH/SVRG
74101
- GT-SARAH
75102
- DESTRESS
103+
104+
105+
## 6. Change log
106+
107+
- Mar-03-2022: Add GPU support, refactor code

Diff for: experiments/convex/linear_regression.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@
1818

1919
kappa = 10
2020
mu = 5e-10
21-
n_iters = 30
21+
n_iters = 10
2222

23-
p = LinearRegression(n_agent, m, dim, noise_variance=1, kappa=kappa, graph_type='er', graph_params=0.3)
23+
p = LinearRegression(n_agent=n_agent, m=m, dim=dim, noise_variance=1, kappa=kappa, graph_type='er', graph_params=0.3)
2424
W, alpha = generate_mixing_matrix(p)
2525

2626
log.info('m = %d, n = %d, alpha = %.4f' % (m, n_agent, alpha))
@@ -52,6 +52,6 @@
5252

5353
exps = centralized + distributed
5454

55-
res = run_exp(exps, kappa=kappa, max_iter=n_iters, name='linear_regression', n_process=5, save=True)
55+
res = run_exp(exps, kappa=kappa, max_iter=n_iters, name='linear_regression', n_cpu_processes=4, save=True)
5656

5757
plt.show()

Diff for: experiments/convex/logistic_regression.py

+37-16
Original file line numberDiff line numberDiff line change
@@ -3,52 +3,73 @@
33
import numpy as np
44
import matplotlib.pyplot as plt
55

6-
from nda import log
76
from nda.problems import LogisticRegression
87
from nda.optimizers import *
98
from nda.optimizers.utils import generate_mixing_matrix
10-
from nda.experiment_utils import run_exp
119

10+
from nda.experiment_utils import run_exp
1211

1312
if __name__ == '__main__':
14-
1513
n_agent = 20
1614
m = 1000
1715
dim = 40
1816

19-
kappa = 10
17+
18+
kappa = 10000
19+
mu = 5e-3
20+
21+
kappa = 100
2022
mu = 5e-8
21-
n_iters = 30
2223

23-
p = LogisticRegression(n_agent, m, dim, noise_ratio=0.05, kappa=kappa, graph_type='er', graph_params=0.3)
24-
W, alpha = generate_mixing_matrix(p)
25-
log.info('m = %d, n = %d, alpha = %.4f' % (m, n_agent, alpha))
24+
n_iters = 10
25+
26+
p = LogisticRegression(n_agent=n_agent, m=m, dim=dim, noise_ratio=0.05, graph_type='er', kappa=kappa, graph_params=0.3)
27+
print(p.n_edges)
28+
2629

2730
x_0 = np.random.rand(dim, n_agent)
2831
x_0_mean = x_0.mean(axis=1)
32+
W, alpha = generate_mixing_matrix(p)
33+
print('alpha = ' + str(alpha))
34+
2935

30-
eta = 2 / (p.L + p.sigma)
36+
eta = 2/(p.L + p.sigma)
3137
n_inner_iters = int(m * 0.05)
3238
batch_size = int(m / 10)
39+
batch_size = 10
3340
n_dgd_iters = n_iters * 20
34-
n_sarah_iters = n_iters * 20
41+
n_svrg_iters = n_iters * 20
3542
n_dsgd_iters = int(n_iters * m / batch_size)
3643

37-
centralized = [
44+
45+
single_machine = [
3846
GD(p, n_iters=n_iters, eta=eta, x_0=x_0_mean),
3947
SGD(p, n_iters=n_dsgd_iters, eta=eta*3, batch_size=batch_size, x_0=x_0_mean, diminishing_step_size=True),
4048
NAG(p, n_iters=n_iters, x_0=x_0_mean),
41-
SARAH(p, n_iters=n_sarah_iters, n_inner_iters=n_inner_iters, eta=eta / 20, x_0=x_0_mean)
49+
SVRG(p, n_iters=n_svrg_iters, n_inner_iters=n_inner_iters, eta=eta/20, x_0=x_0_mean),
50+
SARAH(p, n_iters=n_svrg_iters, n_inner_iters=n_inner_iters, eta=eta/20, x_0=x_0_mean),
4251
]
4352

53+
4454
distributed = [
45-
DGD_tracking(p, n_iters=n_dgd_iters, eta=eta / 10, x_0=x_0, W=W),
46-
DANE(p, n_iters=n_iters, mu=mu, x_0=x_0_mean),
55+
DGD_tracking(p, n_iters=n_dgd_iters, eta=eta/10, x_0=x_0, W=W),
56+
DSGD(p, n_iters=n_dsgd_iters, eta=eta*2, batch_size=batch_size, x_0=x_0, W=W, diminishing_step_size=True),
57+
EXTRA(p, n_iters=n_dgd_iters, eta=eta/2, x_0=x_0, W=W),
58+
NIDS(p, n_iters=n_dgd_iters, eta=eta, x_0=x_0, W=W),
59+
60+
ADMM(p, n_iters=n_iters, rho=1, x_0=x_0_mean),
61+
DANE(p, n_iters=n_iters, mu=mu, x_0=x_0_mean)
62+
]
63+
64+
network = [
65+
NetworkSVRG(p, n_iters=n_svrg_iters, n_inner_iters=n_inner_iters, eta=eta/20, mu=mu, x_0=x_0, W=W, batch_size=batch_size),
66+
NetworkSARAH(p, n_iters=n_svrg_iters, n_inner_iters=n_inner_iters, eta=eta/20, mu=mu, x_0=x_0, W=W, batch_size=batch_size),
4767
NetworkDANE(p, n_iters=n_iters, mu=mu, x_0=x_0, W=W),
4868
]
4969

50-
exps = centralized + distributed
70+
exps = single_machine + distributed + network
71+
72+
res = run_exp(exps, kappa=kappa, max_iter=n_iters, name='logistic_regression', n_cpu_processes=4, save=True)
5173

52-
res = run_exp(exps, kappa=kappa, max_iter=n_iters, name='logistic_regression', n_process=1, save=True)
5374

5475
plt.show()

Diff for: experiments/non_convex/gisette_classification.py

-35
This file was deleted.

0 commit comments

Comments
 (0)