Skip to content

Commit

Permalink
storage: Work in progress on the Storchid storage project. Cluster up…
Browse files Browse the repository at this point in the history
…load working. Updated docs.
  • Loading branch information
patniemeyer committed Jan 10, 2024
1 parent f406778 commit 7b999a7
Show file tree
Hide file tree
Showing 39 changed files with 1,446 additions and 421 deletions.
24 changes: 22 additions & 2 deletions str-twincoding/README-in.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ and join the discussion on the [Orchid Subreddit](https://www.reddit.com/r/orchi
This repository contains work in progress on the file encoding CLI and server framework.


![monitor](docs/monitor.png "Monitor")
![monitor](docs/screen.png "Screens")


A key aspect of the Orchid Storage project is the use of an efficient encoding scheme that minimizes
bandwidth costs incurred during migration of distributed data through providers over time.
Expand All @@ -30,7 +31,7 @@ files, decoding files with erasures, and optimally recovering lost shards.
See [`twin_coding.py`](encoding/twin_coding.py) for an explanation of the algorithm, example code, and a link to the original paper.


## Installation
## Development Installation

```
# Create a virtual environment
Expand All @@ -50,6 +51,20 @@ source venv/bin/activate
pip install -r requirements.txt
```

### Environment

The `STRHOME` (storage home) environment var is a path that determines the location of the default
`repository` folder and `providers.jsonc` data stores. During development you can source the provided
`env.sh` script to automatically set `STRHOME` and `PYTHONPATH` to the project folder and activate the venv
in that folder.

```
export STRHOME=[Project Folder]
export PATH=$PATH:"$STRHOME"
export PYTHONPATH="$STRHOME"
```


## Example Usage
```
INSERT_USAGE
Expand All @@ -72,6 +87,11 @@ INSERT_STORAGE_DOCS
INSERT_SERVER_DOCS
```

## Providers Docs
```
INSERT_PROVIDERS_DOCS
```

## Monitor Docs
```
INSERT_MONITOR_DOCS
Expand Down
88 changes: 67 additions & 21 deletions str-twincoding/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ and join the discussion on the [Orchid Subreddit](https://www.reddit.com/r/orchi
This repository contains work in progress on the file encoding CLI and server framework.


![monitor](docs/monitor.png "Monitor")
![monitor](docs/screen.png "Screens")


A key aspect of the Orchid Storage project is the use of an efficient encoding scheme that minimizes
bandwidth costs incurred during migration of distributed data through providers over time.
Expand All @@ -30,7 +31,7 @@ files, decoding files with erasures, and optimally recovering lost shards.
See [`twin_coding.py`](encoding/twin_coding.py) for an explanation of the algorithm, example code, and a link to the original paper.


## Installation
## Development Installation

```
# Create a virtual environment
Expand All @@ -50,40 +51,53 @@ source venv/bin/activate
pip install -r requirements.txt
```

### Environment

The `STRHOME` (storage home) environment var is a path that determines the location of the default
`repository` folder and `providers.jsonc` data stores. During development you can source the provided
`env.sh` script to automatically set `STRHOME` and `PYTHONPATH` to the project folder and activate the venv
in that folder.

```
export STRHOME=[Project Folder]
export PATH=$PATH:"$STRHOME"
export PYTHONPATH="$STRHOME"
```


## Example Usage
```
# Generate some test files
test-content.sh
# Generate a test file
dd if=/dev/urandom of="foo_file.dat" bs=1K count=1 status=none
# Import a file into the default local repository with default encoding
storage.sh import data/foo_file.dat
storage.sh import foo_file.dat
# List the repository
storage.sh repo list
# Start a test provider server cluster
test-cluster.sh start 5001 5002 5003 5004 5005
examples/test-cluster.sh start 5001 5002 5003 5004 5005
# Confirm that the test servers are running
test-cluster.sh list
examples/test-cluster.sh list
# "Discover" these providers, adding them to our known provider list
# This will normally be done via the directory service and performed at file push time.
test-discover.sh 5001 5002 5003 5004 5005
providers.sh add 5001 5002 5003 5004 5005
# List the known providers
providers.sh list
# Start the monitor application (in another window)
# tmux split
monitor.sh --update 1
# Push the file by name
storage.sh push foo_file.dat
# TODO:
# Monitor file availability while:
# Observing resilient upload progress
# Killing servers and prompting efficient rebuilds
# Shut downt the servers
test-cluster.sh stop
# Shut down the servers
examples/test-cluster.sh stop
```

## Encoding CLI Examples
Expand Down Expand Up @@ -285,9 +299,20 @@ options:
--overwrite Overwrite existing files.
None
```
###`list`
```
usage: storage list [-h] [--repo REPO]
options:
-h, --help show this help message and exit
--repo REPO Path to the repository.
None
```
###`push`
```
usage: storage push [-h] [--repo REPO] [--servers [SERVERS ...]] [--validate]
usage: storage push [-h] [--repo REPO] [--providers [PROVIDERS ...]]
[--validate] [--target_availability TARGET_AVAILABILITY]
[--dryrun] [--overwrite]
file
positional arguments:
Expand All @@ -296,9 +321,13 @@ positional arguments:
options:
-h, --help show this help message and exit
--repo REPO Path to the repository.
--servers [SERVERS ...]
List of server names or urls to push to.
--providers [PROVIDERS ...]
Optional list of provider names or urls for the push.
--validate After push, download and reconstruct the file.
--target_availability TARGET_AVAILABILITY
Target availability for the file.
--dryrun, -n Show the plan without executing it.
--overwrite Overwrite files on the server.
None
```

Expand All @@ -307,7 +336,7 @@ None
Using default repository: /Users/pat/Desktop/OrchidProject/lab.orchid.com/orchid/str-twincoding/repository
usage: server_cli.py [-h] [--config CONFIG] [--interface INTERFACE]
[--port PORT] [--repository_dir REPOSITORY_DIR]
[--auth_key AUTH_KEY] [--show_console]
[--auth_key AUTH_KEY] [--debug]
Flask server with argument parsing
Expand All @@ -320,7 +349,24 @@ options:
--repository_dir REPOSITORY_DIR
Directory to store repository files
--auth_key AUTH_KEY Authentication key to validate requests
--show_console Flag to show console logs
--debug Debug server
```

## Providers Docs
```
usage: providers_cli.py [-h] [--file FILE] COMMAND ...
Process command line arguments.
positional arguments:
COMMAND Sub-commands available.
list List providers
add Add providers
clear Clear the providers file
options:
-h, --help show this help message and exit
--file FILE Providers config file path
```

## Monitor Docs
Expand All @@ -332,7 +378,7 @@ Process command line arguments.
options:
-h, --help show this help message and exit
--providers PROVIDERS
Providers file path
Providers config file path
--debug Show debug
--update UPDATE Update view with polling period seconds
```
9 changes: 9 additions & 0 deletions str-twincoding/build_readme.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

source "$(dirname "$0")/env.sh"

back="/tmp/README.md.$$"
echo "backing up README.md to $back"
cp README.md "$back"

# examples
tmp=/tmp/readme.$$
sed -n '/# START_EXAMPLES/,/# END_EXAMPLES/p' examples/examples.sh | sed '/# START_EXAMPLES/d; /# END_EXAMPLES/d' > $tmp
Expand All @@ -21,6 +25,11 @@ server.sh --help > $tmp
sed "/INSERT_SERVER_DOCS/r $tmp" README.md | sed '/INSERT_SERVER_DOCS/d' > out.md
mv out.md README.md

# providers cli
providers.sh --help > $tmp
sed "/INSERT_PROVIDERS_DOCS/r $tmp" README.md | sed '/INSERT_PROVIDERS_DOCS/d' > out.md
mv out.md README.md

# monitor cli
monitor.sh --help > $tmp
sed "/INSERT_MONITOR_DOCS/r $tmp" README.md | sed '/INSERT_MONITOR_DOCS/d' > out.md
Expand Down
Binary file removed str-twincoding/docs/monitor.png
Binary file not shown.
Binary file added str-twincoding/docs/screen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 12 additions & 13 deletions str-twincoding/docs/usage.sh
Original file line number Diff line number Diff line change
@@ -1,32 +1,31 @@
# Generate some test files
test-content.sh
# Generate a test file
dd if=/dev/urandom of="foo_file.dat" bs=1K count=1 status=none

# Import a file into the default local repository with default encoding
storage.sh import data/foo_file.dat
storage.sh import foo_file.dat

# List the repository
storage.sh repo list

# Start a test provider server cluster
test-cluster.sh start 5001 5002 5003 5004 5005
examples/test-cluster.sh start 5001 5002 5003 5004 5005

# Confirm that the test servers are running
test-cluster.sh list
examples/test-cluster.sh list

# "Discover" these providers, adding them to our known provider list
# This will normally be done via the directory service and performed at file push time.
test-discover.sh 5001 5002 5003 5004 5005
providers.sh add 5001 5002 5003 5004 5005

# List the known providers
providers.sh list

# Start the monitor application (in another window)
# tmux split
monitor.sh --update 1

# Push the file by name
storage.sh push foo_file.dat

# TODO:
# Monitor file availability while:
# Observing resilient upload progress
# Killing servers and prompting efficient rebuilds

# Shut downt the servers
test-cluster.sh stop
# Shut down the servers
examples/test-cluster.sh stop
12 changes: 6 additions & 6 deletions str-twincoding/encoding/file_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
import numpy as np
from tqdm import tqdm

from storage.config import NodeType, EncodedFileConfig
from storage.util import assert_rs
from storage.storage_model import NodeType, EncodedFile, assert_rs
from storage.repository import Repository

from encoding.chunks import ChunksReader, open_output_file
Expand Down Expand Up @@ -54,7 +53,7 @@ def __init__(self,
# at least k files of the same type.
@staticmethod
def from_encoded_dir(path: str, output_path: str = None, overwrite: bool = False):
file_config = EncodedFileConfig.load(os.path.join(path, 'config.json'))
file_config = EncodedFile.load(os.path.join(path, 'config.json'))
assert file_config.type0.k == file_config.type1.k, "Config node types must have the same k."
recover_from_files = FileDecoder.get_threshold_files(path, k=file_config.type0.k)
if os.path.basename(list(recover_from_files)[0]).startswith("type0_"):
Expand Down Expand Up @@ -124,11 +123,12 @@ def close(self):


if __name__ == '__main__':
repo = Repository('./repository')
repo = Repository.default()
filename = 'file_1KB.dat'
original_file = repo.tmp_file_path(filename)
encoded_file = repo.file_path(filename)
print(repo.status_str(filename))
encoded_file = repo.file_dir_path(filename)
file_status = repo.file_status(filename)
print(file_status.status_str())

recovered_file = repo.tmp_file_path(f'recovered_{filename}')
decoder = FileDecoder.from_encoded_dir(
Expand Down
Loading

0 comments on commit 7b999a7

Please sign in to comment.