Skip to content

Commit f7fcbcb

Browse files
authored
Merge pull request #214 from kbase/dts-docs
Update docs for dts-flavored bulk_specification endpoint
2 parents 0c07972 + 20f164e commit f7fcbcb

File tree

4 files changed

+60
-16
lines changed

4 files changed

+60
-16
lines changed

README.md

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ if you want to run locally you must install requirements.txt for python3
1717

1818
## running
1919

20-
to run locally run /deployment/bin/entrypoint.sh
20+
to run locally run /deployment/bin/entrypoint_staging_service.sh
2121

2222
to run inside docker run /run_in_docker.sh
2323

@@ -62,6 +62,21 @@ To add new mappings, update the `GenerateMappings.py` script. New file types and
6262
to the `staging_service.autodetect.Mappings` module and included from there. See `GenerateMappings.py`
6363
docstrings for more details.
6464

65+
## Data Transfer Service file watcher
66+
The KBase [Data Transfer Service](https://github.com/kbase/dts) (DTS,
67+
See [here](https://kbase.github.io/dts) for further documentation) copies data files from external sources into a KBase user's staging area, accompanied by a `manifest.json` file.
68+
However, due to various permissions reasons, it cannot copy those files directly. First,
69+
it drops them off in a separate directory to which it has access. The Staging Service DTS
70+
File Watcher then moves those files to the user's directory.
71+
72+
The DTS File Watcher is a separate entrypoint that uses much of the same machinery as the
73+
rest of the Staging Service. The script can be found in `scripts/run_dts_watcher.py`, and
74+
the entrypoint can be found in `deployment/bin/entrypoint_dts_watcher.sh`. It
75+
does not provide a web service, but just watches a directory given in the config as
76+
`DTS_STAGING_DIR` for changes. If it sees a `manifest.json` file in a subdirectory, it
77+
parses that to get a KBase username and moves the whole subdirectory to that user's
78+
staging area.
79+
6580
## API
6681

6782
all paths should be specified treating the user's home directory as root
@@ -802,21 +817,27 @@ Error Connecting to auth service ...
802817
### Parse bulk specifications
803818

804819
This endpoint parses one or more bulk specification files in the staging area
805-
into a data
806-
structure (close to) ready for insertion into the Narrative bulk import or
807-
analysis cell.
820+
into a data structure (close to) ready for insertion into the Narrative bulk
821+
import or analysis cell.
808822

809-
It can parse `.tsv`, `.csv`, and Excel (`.xls` and `.xlsx`) files. Templates for
810-
the currently
811-
supported data types are available in
823+
By default, it can parse `.tsv`, `.csv`, and Excel (`.xls` and `.xlsx`) files.
824+
Templates for the currently supported data types are available in
812825
the [templates](./import_specifications/templates)
813826
directory of this repo. See
814827
the [README.md](./import_specifications/templates/README.md) file
815828
for instructions on template usage.
816829

830+
When given the `dts` flag in the URL, this endpoint can also parse manifest
831+
files that come from the KBase [Data Transfer Service](https://github.com/kbase/dts) (DTS)
832+
See [here](https://kbase.github.io/dts) for further documentation on the service.
833+
This service copies data files from external sources into a KBase user's staging
834+
area, accompanied by a `manifest.json` file. These also contain information on how to
835+
load files into KBase importer apps. However, since they are a very specific format,
836+
this means adding a `dts` flag to the URL. When this flag is present, all files
837+
are expected to be `.json` files and conform to the [DTS schema](./import_specifications/schema/dts_manifest_schema.json).
838+
817839
See the [import specification ADR document](./docs/import_specifications.ADR.md)
818-
for design
819-
details.
840+
for design details.
820841

821842
**URL** : `ci.kbase.us/services/staging_service/bulk_specification`
822843

@@ -830,15 +851,24 @@ details.
830851

831852
**Code** : `200 OK`
832853

833-
**Content example**
854+
**Content examples**
834855

835856
```
836857
GET bulk_specification/?files=file1.<ext>[,file2.<ext>,...]
837858
```
838859

839860
`<ext>` is one of `csv`, `tsv`, `xls`, or `xlsx`.
840861

841-
Reponse:
862+
863+
```
864+
GET bulk_specification/?files=file1.json[,file2.json,...]&dts
865+
```
866+
867+
When using the `dts` flag, all files must be JSON and have the `.json` extension.
868+
869+
Both versions of the endpoint respond in the same format.
870+
871+
Response:
842872

843873
```
844874
{
@@ -1024,7 +1054,7 @@ POST write_bulk_specification/
10241054
- `data` contains any data to be written to the file as example data, and is analogous to the data structure returned from the parse endpoint. To specify that no data should be written to the template provide an empty list.
10251055
- `<value for ID, row N>` is the value for the input for a given `spec.json` ID and import or analysis instance, where an import/analysis instance is effectively a row in the data file. Each data file row is provided in order for each type. Each row is provided in a mapping of `spec.json` ID to the data for the row. Lines > 3 in the templates are user-provided data, and each line corresponds to a single import or analysis.
10261056

1027-
Reponse:
1057+
Response:
10281058

10291059
```
10301060
{

RELEASE_NOTES.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,20 @@
11
# Release Notes
22

3-
## Unreleased
3+
## Version 1.4.0
44

5-
- update to Python 3.11.4
5+
- update to Python 3.11.13
66
- run black on all service and test files, fix most linting complaints (pylint,
77
sonarlint)
8+
- Added a new KBaseAuth client
9+
- Modified the `bulk_specification` endpoint to parse DTS `manifest.json` files with
10+
a `dts` flag.
11+
- Added a DTS file watcher that moves a user's files from the DTS dropoff point to
12+
the user's subdirectory.
13+
- Added `AUTH_TOKEN`, `DTS_MANIFEST_SCHEMA`, and `DTS_STAGING_DIR` config values.
14+
- These are all required.
15+
- `AUTH_TOKEN` is a service token used by the service to verify user existence.
16+
- Added a separate StagingServiceConfig object that manages configurations.
17+
- Modified the `AUTH_URL` config to point to the root of the auth service.
818

919
## Version 1.3.6
1020

import_specifications/readme.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,6 @@
11
This directory contains templates for bulk import specifications to be parsed by the
2-
import specfication endpoint, and example code for how to generate them automatically.
2+
import specification endpoint, and example code for how to generate them automatically.
3+
4+
This also contains a subdirectory for storing JSON Schemas that describe JSON-based
5+
import specifications. These should be referenced by deployment and testing configuration
6+
as necessary.

staging_service/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242

4343
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
4444
routes = web.RouteTableDef()
45-
VERSION = "1.3.6"
45+
VERSION = "1.4.0"
4646

4747
_DATATYPE_MAPPINGS = None
4848
_DTS_MANIFEST_VALIDATOR: jsonschema.Draft202012Validator | None = None

0 commit comments

Comments
 (0)