Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 0 additions & 24 deletions armory/baseline_models/pytorch/deep_speech.py

This file was deleted.

289 changes: 0 additions & 289 deletions armory/baseline_models/pytorch/sincnet.py

This file was deleted.

42 changes: 28 additions & 14 deletions armory/datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,50 +84,64 @@ info, ds = load.load("digit")
info, ds = load.from_directory("/armory/datasets/new_builds/digit/1.0.8")
```

### Apache Beam Datasets

Currently, `librispeech` and `librispeech_dev_clean` use apache beam to build.
Apache beam is not installed by default in the container due to older dependencies.
If building in the container, do:
```
pip install apache-beam
```

When building, armory does not provide beam options by default.
This makes building VERY slow unless overrides are provided.
It is recommended that these are built directly using tfds on the command line.

## Packaging and Uploading for Cache

After a dataset has been successfully built and loaded (locally), it can be packaged and uploaded to the cache.

First, it is recommended that you test the packaging and untarring process without upload/download.

In python:
```
```python
from armory.datasets import package
package.package("my_dataset") # creates a tar.gz file
package.update("my_dataset") # adds the tar hash info to "cached_datasets.json"
package.verify("my_dataset") # uses the "cached_datasets.json" information to verify hash information on tar file
package.extract("my_dataset", overwrite=False) # This should raise an error, unless you first remove the built dataset; it will ask you to overwrite
package.extract("my_dataset", overwrite=True) # extracts the tar file into the data directory, overwriting the old one (if overwrite is false, this should raise an error)
my_dataset = "my_dataset"
package.package(my_dataset) # creates a tar.gz file
package.update(my_dataset) # adds the tar hash info to "cached_datasets.json"
package.verify(my_dataset) # uses the "cached_datasets.json" information to verify hash information on tar file
package.extract(my_dataset, overwrite=False) # This should raise an error, unless you first remove the built dataset; it will ask you to overwrite
package.extract(my_dataset, overwrite=True) # extracts the tar file into the data directory, overwriting the old one (if overwrite is false, this should raise an error)
```

If you can successfully load the dataset after extracting it here, this part is good.

Now, to upload to s3 (you will need `ARMORY_PRIVATE_S3_ID` and `ARMORY_PRIVATE_S3_KEY`):
```
```python
from armory.datasets import upload
upload.upload("my_dataset") # this will fail, as you need to explicitly force it to be public
upload.upload("my_dataset", public=True)
upload.upload(my_dataset) # this will fail, as you need to explicitly force it to be public
upload.upload(my_dataset, public=True)
```

Or, alternatively to packaging and uploading, you can use this convenience function:
```
package.add_to_cache("my_dataset", public=True)
```python
package.add_to_cache(my_dataset, public=True)
```

To download, which will download it directly to the tar cache directory, do:
```
from armory.datasets import download
download.download("my_dataset", overwrite=True, verify=True)
download.download(my_dataset, overwrite=True, verify=True)
```

You can also download and extract with:
```
from armory.datasets import load
load.ensure_download_extract("my_dataset", verify=True)
load.ensure_download_extract(my_dataset, verify=True)
```
or just try to load it directly
```
load.load("my_dataset")
load.load(my_dataset)
```

# Running / Testing with current armory scenario files
Expand Down
Loading