Topcat's Upload Mechanism

Brian Ritchie, Nov 2019

Introduction

Topcat can be configured to allow users to upload datafiles. When enabled in configuration, an Upload button appears in the Browse view for Datasets and/or Datafiles. When clicked, this opens a dialog which allows the user to drag and drop files to be added to a dataset. This will either be the current dataset in the browse view (i.e. when the user is browsing the list of datafiles within a dataset) or if the user is browsing a list of datasets the dialog will ask the user for the name of a new dataset (which must not already exist).

This document describes the relevant sections of code.

Configuration

The Upload behaviour can be configured in topcat.json, using the following properties under each facility:

browse.<entity>.gridOptions.enableUpload: boolean, true to enable uploads for this entity type. (<entity> is restricted to Dataset and Datafile, i.e. when browsing lists of datasets or datafiles.)
idsUploadDatafileFormat: string, the name of the DatafileFormat to set for uploads
idsUploadDatasetType: string, the name of the DatasetType to set for uploads
idsUploadMaxTotalFileSize: number, the maximum total size of (all files in) an upload

All properties are optional in the schema, but in practice if enableUpload is defined and true anywhere then the DatafileFormat and DatasetType must also be defined. (I believe that the filesize limit can still be optional; if omitted, no size limit will be imposed by Topcat.)

Views

There are three views involved:

browse-entities.html : an Upload button is added (above the grid view, to the right) when uploading is enabled.
upload.html : the Upload dialog, that allows the user to choose files to upload
upload-area.html : directive view used in the Upload dialog for files selection (drag/drop); displays a progress bar and a cancel/remove button for each file

The upload progress bars displayed in the upload-area are maintained by ids.upload() (that is, in tc-ids.service).

upload.html detail

If the datasetId is not set from the state/context, a text input box is shown for the user to supply a DatasetName for a new Dataset. I presume that there should not be an existing Dataset with the same name; if there is, then the dataset creation prior to the upload will probably fail.

The files input is provided by the upload-area directive.

The current total size is displayed, along with a warning if this exceeds the configured limit.

The Upload button itself is guarded in several ways:

  <button
      type="submit"
      class="btn btn-primary"
      ng-click="uploadController.upload()"
      translate="UPLOAD.BUTTON.UPLOAD.TEXT"
      ng-disabled="(!uploadController.datasetId && uploadController.name == '')  
          || uploadController.files.length == 0 
          || uploadController.isUploading 
          || (uploadController.maxTotalFileSize !== undefined && uploadController.totalFileSize() > uploadController.maxTotalFileSize)"
  ></button>

The button is disabled when any of the following are true:

neither the datasetId nor the user-supplied name are set
no files have been selected
an upload is already in progress (the isUploading flag is set)
the total size limit is defined and has been exceeded

Controllers

browse-entities.controller

This defines an upload() function that is bound to the Upload button in the browse-entities view:

        this.upload = function() {
            $uibModal.open({
                templateUrl: 'views/upload.html',
                controller: 'UploadController as uploadController',
                size : 'lg'
            });
        };

The function opens a modal dialog with the upload view and uploadController.

upload.controller

The main function here is upload(), which is bound to the upload button in the modal dialog. This first checks and complains if the user has selected duplicate files. Next, if no datasetId has been set (by the state context - see below), icat.write() is used to create a new Dataset using the datasetTypeId of the idsUploadDatasetType (name) defined in the configuration and the investigationId (from the state context), and to obtain a new datasetId. Then ids.upload() is called to upload the datafiles to the IDS.

Archeology digression: when/where does datasetId get set?

The datasetId is set from $state.params.datasetId. It isn't at all clear when this is/isn't set.

This digression shows how I have tried to work this out - it may be useful to any future Topcat archeologist to see how I approach this!

I assumed that the datasetId would be set somehow when the user is browsing a dataset; but datasetId isn't mentioned at all in browse-entities.controller, and I can find no useful references to datasetId anywhere else in the code.

Search for refs to params.datasetId under yo/app/ though it may not be set so directly. Indeed: no other references found.

In tc-ids.service, datasetId is a parameter to this.upload() - but that's fed from upload.controller, so doesn't explain/help. Search: 12 refs to datasetId under yo/app/, none of which appear to be setting $state.params. Could it be that it never happens? Or is the param name 'datasetId' being constructed in the code (e.g. as entityType + 'Id')?

Search for $state.params : there are many hits in the controllers, but most are not relevant. browse-entities.controller looks for $state.params[entityType + 'Id'], but never sets it. One hit in the services: tc-icat-entity.service : this.stateParams() clones $state.params, sets the value of entityType + 'Id' to this.id in the clone, then returns the clone. So for a dataset entity, this would set datasetId.

This looks suspiciously like what we're looking for, though we're still missing the point where $state.params is set to this. So look for references to stateParams: 18 matches in yo/app/. Most are not relevant (they are reading $stateParams, not setting them); I could have narrowed the search by looking for .stateParams or .stateParams(, though both risk missing meaningful uses).

In the end, the most promising-looking match is back in tc-icat-entity.service, where this.browse() calls this.stateParams() then does some sort of state hierarchy pushing and finally does $state.go(state, params, goOptions) which I think does what I expected.

Note that this.stateParams() builds its output map using this.thisAndAncestors(); the result is that the map contains entityId values for not just the current entity but also all of its ancestor entities. So, for a Dataset, $state.params will include datasetId, investigationId and (via a special case) proposalId.

(Actually, it's somewhat more complicated than this. If the current state name starts with home.browse.facility. (which matches the state name constructed by this.browse()) then this.stateParams() doesn't use thisAndAncestors but clones the current $state.params then adds (or changes) the current entityTypeId. I assume that if Upload needs the investigationId and the state name matches the pattern, then it is already present in $state.params. If the state doesn't match the pattern, then I assume that investigationId will be added via thisAndAncestors().)

I conclude - eventually! - that $state.params.datasetId is set (only) when the user is browsing (the datafiles within) a dataset; and so in this case, the Upload button, if present, will allow the user to add files to the current dataset.

As the enableUpload property can only be set for browse.dataset and browse.datafile, this means that the datasetId can only be set when the user is browsing within a particular dataset, i.e. the datafiles within it.

(End of digression).

If the upload completes successfully, then Topcat is refreshed (I presume this forces the current browse view to update from source and so pick up the new dataset or datafiles), 'upload:complete' is broadcast, and the upload dialog is dismissed.

    ids.upload(this.datasetId, this.files).then(function(datafileIds){
        tc.refresh();
        $rootScope.$broadcast('upload:complete', datafileIds);
        $uibModalInstance.dismiss('cancel');
    }, handleError);

If uploading fails, then the isUploading flag is cleared and the user is informed.

The isUploading flag is used to disable the upload button while an upload is in progress. The flag is not explicitly cleared in (success case in) the upload() method itself, so I presume that somewhere else clears it in response to the broadcast. However, a search for 'upload:complete' fails to find any other references under /yo/app; so it is not clear what, if anything, responds to the broadcast. So what happens to the isUploading flag when the upload does not fail? Nothing, it would seem. However, when uploading completes, the Upload modal dialog is dismissed, and the flag is no longer relevant (as it is mainly used to guard buttons in the dialog). The flag is initialised to false in the controller startup, which (I think) will happen each time a new Upload dialog is created.

Directive: upload-area.directive

This binds to the upload-area.html and defines a number of (mouse) events that (I think!) implement a drag-and-drop interface for the inclusion and removal of files.

Services

object-validator.service

Defines/declares the properties in topcat.json that control the Upload behaviour, as described in Configuration above.

tc-facility.service

The config() function sets properties in the facility configuration for the idsUploadDatafileFormatId and idsUploadDatasetTypeId from their values in the current session; these in turn come ultimately from the idsUploadDatafileFormat and idsUploadDatasetType names for the facility that are defined in topcat.json (see tc-icat.service below).

tc-icat.service

The login() function tests that if the idsUploadDatafileFormat or idsUploadDatasetType are defined in the configuration for the current facility, then they correspond to known entities in ICAT, and sets the corresponding IDs in the session state. (It doesn't appear to insist that if one is defined, the other must be defined too.) If either fails, then a message is shown in the browser console; but the effect to the user will probably be either a "login failed" alert, or a blank page.

tc-ids.service

This defines an upload() function that takes a DatasetId and a list of files and performs some gobbledegook to upload the files in chunks using a ws:// or wss:// url based on the Topcat URL. The URL construction is:

            if(topcatUrl.match(/^https:\/\//)){
              topcatUrl = "wss://" + topcatUrl.replace(/^https:\/\//, '');
            } else {
              topcatUrl = "ws://" + topcatUrl.replace(/^http:\/\//, '');
            }

            var currentUrl = topcatUrl + "/topcat/ws/user/upload?" + ...;

where ... are the url-encoded parameters; these include:

            facilityName: facility.config().name,
            sessionId: facility.icat().session().sessionId,
            datasetId: datasetId,
            datafileFormatId: facility.config().idsUploadDatafileFormatId,
            name: file.name,
            contentLength: file.size

The code appears to generate a separate request for each file; but the iteration over the list of files is a somewhat convoluted combination of recursion and/or connection callback functions and list-shifting. (It doesn't help that the inner function that does the real work is also called upload()!)

During the upload, the code updates the percentage progress of each file, which is displayed in the upload-area directive view. (See the definition of the readChunk() function, and the reader.onload() function defined within it.)

Server-side

IdsUploadProxy

This class provides the /topcat/ws/user/upload endpoint that is used in tc-ids.service's upload() function. The class declaration uses the following annotations:

@ApplicationScoped
@ServerEndpoint("/topcat/ws/user/upload")
public class IdsUploadProxy {

and the main methods are annotated with @OnOpen, @OnClose, @OnError and @OnMessage, which I presume mean something in the world of server endpoints.

The class maintains a map from Session to (single instances of) an internal class Upload (i.e. at any one time there is at most one Upload per Session). The @OnOpen method adds a new Upload instance to the map; the @OnClose method removes it.

The Upload class constructor takes a Session, from which it extracts the parameters (supplied in the /topcat/ws/user/upload URL, see above), and uses these to construct an /ids/put request (that is sent to the IDS associated with the facility (name) in topcat.properties). The @OnMessage method appears to be what triggers the actual writing of file data to the IDS (by calling Upload.write()).

I presume that the IDS will use its configured storage mechanism to associate the data in the request body with the filename supplied in the parameters.

Some thoughts on storage vs. retrieval

This is only relevant if uploading is enabled in Topcat.

Do any facilities use separate IDS instances for storage and retrieval, possibly for performance reasons? (I think that DLS use a separate IDS for ingestion, but do not allow uploads through Topcat.)

IdsUploadProxy is coded to use the IDS url that is associated with the facility name in topcat.properties. If the same configuration property is used elsewhere in the server side for data retrieval (I have not checked), then such a separation will not be possible on the server side.

Note however that the client (Javascript) side obtains its IDS urls from topcat.json (and can be configured with different IDS urls for each download type); so it is possible for the client and server side to use different IDS urls. The client-side IDS urls will be used solely for retrieval; only the server-side URL might be used for storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly