Report

Jump to bottom

Elwin Stephan edited this page Dec 13, 2021 · 13 revisions

Roter Faden

We try to solve a distributed outer product
First attempt:
- Naive allgather
- Naive allreduce
- -> Works decently well
Second attempt
- Look at various implementations of the decision tree
- Optimize them, specifically targeted at our initial problem
- -> Successful mainly for both ring (attempt to beat native-ring) and g-rabenseifner
Future work
- Push to upstream MPI implementation (if possible)
- Take network topology into consideration

Table of Content

Abstract

What do we want to solve
Why is it a problem
How do we solve it

Introduction

Start from birds-eye view (gradient descent in Neural Networks)
Zoom into the detailed problem (send around chunks of data)

Background

Concept of {gather, reduce} (not the MPI function but instead the concept)
MPI Library
- allgather
- allreduce
- decision tree (explanation for the functions?)

Related Work

Rabenseifner?

Implementation

Basic implementation
- Allgather
- Allreduce
Ring
g-rabenseifner-allgather
- rabenseifner (describe basics of algo)
- describe improvements made in our case
Other attempts
- allgather-async
- allreduce-butterfly
- ...

Experimental Results

Listing results
reasoning and interpretation of results (in detail)

Conclusion

General conclusions of results in context of the whole problem statement and project
What are problems with the current implementation?
How can they be rectified?

Future Work

Make our algorithms problem independent and merge upstream (possible?)
Take network topology into consideration
Optimize for sparse data (i.e. likely heterogeneus)
-> For all: what are problems?

References

https://web.cels.anl.gov/~thakur/papers/ijhpca-coll.pdf