twosixlabs · davidslater · Dec 1, 2022 · Dec 2, 2022 · Dec 2, 2022 · Dec 6, 2022
diff --git a/armory/baseline_models/pytorch/deep_speech.py b/armory/baseline_models/pytorch/deep_speech.py
diff --git a/armory/baseline_models/pytorch/sincnet.py b/armory/baseline_models/pytorch/sincnet.py
diff --git a/armory/datasets/README.md b/armory/datasets/README.md
@@ -84,50 +84,64 @@ info, ds = load.load("digit")
 info, ds = load.from_directory("/armory/datasets/new_builds/digit/1.0.8")
 ```
 
+### Apache Beam Datasets
+
+Currently, `librispeech` and `librispeech_dev_clean` use apache beam to build.
+Apache beam is not installed by default in the container due to older dependencies.
+If building in the container, do:
+```
+pip install apache-beam
+```
+
+When building, armory does not provide beam options by default.
+This makes building VERY slow unless overrides are provided.
+It is recommended that these are built directly using tfds on the command line.
+
 ## Packaging and Uploading for Cache
 
 After a dataset has been successfully built and loaded (locally), it can be packaged and uploaded to the cache.
 
 First, it is recommended that you test the packaging and untarring process without upload/download.
 
 In python:
-```
+```python
 from armory.datasets import package
-package.package("my_dataset")  # creates a tar.gz file
-package.update("my_dataset")  # adds the tar hash info to "cached_datasets.json"
-package.verify("my_dataset")  # uses the "cached_datasets.json" information to verify hash information on tar file
-package.extract("my_dataset", overwrite=False)  # This should raise an error, unless you first remove the built dataset; it will ask you to overwrite
-package.extract("my_dataset", overwrite=True)  # extracts the tar file into the data directory, overwriting the old one (if overwrite is false, this should raise an error)
+my_dataset = "my_dataset"
+package.package(my_dataset)  # creates a tar.gz file
+package.update(my_dataset)  # adds the tar hash info to "cached_datasets.json"
+package.verify(my_dataset)  # uses the "cached_datasets.json" information to verify hash information on tar file
+package.extract(my_dataset, overwrite=False)  # This should raise an error, unless you first remove the built dataset; it will ask you to overwrite
+package.extract(my_dataset, overwrite=True)  # extracts the tar file into the data directory, overwriting the old one (if overwrite is false, this should raise an error)
 ```
 
 If you can successfully load the dataset after extracting it here, this part is good.
 
 Now, to upload to s3 (you will need `ARMORY_PRIVATE_S3_ID` and `ARMORY_PRIVATE_S3_KEY`):
-```
+```python
 from armory.datasets import upload
-upload.upload("my_dataset")  # this will fail, as you need to explicitly force it to be public
-upload.upload("my_dataset", public=True)
+upload.upload(my_dataset)  # this will fail, as you need to explicitly force it to be public
+upload.upload(my_dataset, public=True)
 ```
 
 Or, alternatively to packaging and uploading, you can use this convenience function:
-```
-package.add_to_cache("my_dataset", public=True)
+```python
+package.add_to_cache(my_dataset, public=True)
 ```
 
 To download, which will download it directly to the tar cache directory, do:
 ```
 from armory.datasets import download
-download.download("my_dataset", overwrite=True, verify=True)
+download.download(my_dataset, overwrite=True, verify=True)
 ```
 
 You can also download and extract with:
 ```
 from armory.datasets import load
-load.ensure_download_extract("my_dataset", verify=True)
+load.ensure_download_extract(my_dataset, verify=True)
 ```
 or just try to load it directly
 ```
-load.load("my_dataset")
+load.load(my_dataset)
 ```
 
 # Running / Testing with current armory scenario files