-
Notifications
You must be signed in to change notification settings - Fork 9
Topcat's Upload Mechanism
Brian Ritchie, Nov 2019
Topcat can be configured to allow users to upload datafiles. When enabled in configuration, an Upload button appears in the Browse view for Datasets and/or Datafiles. When clicked, this opens a dialog which allows the user to drag and drop files to be added to a dataset. This will either be the current dataset in the browse view (i.e. when the user is browsing the list of datafiles within a dataset) or if the user is browsing a list of datasets the dialog will ask the user for the name of a new dataset (which must not already exist).
This document describes the relevant sections of code.
The Upload behaviour can be configured in topcat.json, using the following properties under each facility:
-
browse.<entity>.gridOptions.enableUpload
: boolean, true to enable uploads for this entity type. (<entity>
is restricted to Dataset and Datafile, i.e. when browsing lists of datasets or datafiles.) -
idsUploadDatafileFormat
: string, the name of the DatafileFormat to set for uploads -
idsUploadDatasetType
: string, the name of the DatasetType to set for uploads -
idsUploadMaxTotalFileSize
: number, the maximum total size of (all files in) an upload
All properties are optional in the schema, but in practice if enableUpload is defined and true anywhere then the DatafileFormat and DatasetType must also be defined. (I believe that the filesize limit can still be optional; if omitted, no size limit will be imposed by Topcat.)
There are three views involved:
- browse-entities.html : an Upload button is added (above the grid view, to the right) when uploading is enabled.
- upload.html : the Upload dialog, that allows the user to choose files to upload
- upload-area.html : directive view used in the Upload dialog for files selection (drag/drop); displays a progress bar and a cancel/remove button for each file
The upload progress bars displayed in the upload-area are maintained by ids.upload() (that is, in tc-ids.service).
If the datasetId is not set from the state/context, a text input box is shown for the user to supply a DatasetName for a new Dataset. I presume that there should not be an existing Dataset with the same name; if there is, then the dataset creation prior to the upload will probably fail.
The files input is provided by the upload-area directive.
The current total size is displayed, along with a warning if this exceeds the configured limit.
The Upload button itself is guarded in several ways:
<button
type="submit"
class="btn btn-primary"
ng-click="uploadController.upload()"
translate="UPLOAD.BUTTON.UPLOAD.TEXT"
ng-disabled="(!uploadController.datasetId && uploadController.name == '')
|| uploadController.files.length == 0
|| uploadController.isUploading
|| (uploadController.maxTotalFileSize !== undefined && uploadController.totalFileSize() > uploadController.maxTotalFileSize)"
></button>
The button is disabled when any of the following are true:
- neither the datasetId nor the user-supplied name are set
- no files have been selected
- an upload is already in progress (the isUploading flag is set)
- the total size limit is defined and has been exceeded
This defines an upload() function that is bound to the Upload button in the browse-entities view:
this.upload = function() {
$uibModal.open({
templateUrl: 'views/upload.html',
controller: 'UploadController as uploadController',
size : 'lg'
});
};
The function opens a modal dialog with the upload view and uploadController.
The main function here is upload(), which is bound to the upload button in the modal dialog. This first checks and complains if the user has selected duplicate files. Next, if no datasetId has been set (by the state context - see below), icat.write() is used to create a new Dataset using the datasetTypeId of the idsUploadDatasetType (name) defined in the configuration and the investigationId (from the state context), and to obtain a new datasetId. Then ids.upload() is called to upload the datafiles to the IDS.
The datasetId
is set from $state.params.datasetId
. It isn't at all clear when this is/isn't set.
This digression shows how I have tried to work this out - it may be useful to any future Topcat archeologist to see how I approach this!
I assumed that the datasetId
would be set somehow when the user is browsing a dataset; but datasetId
isn't mentioned at all in browse-entities.controller,
and I can find no useful references to datasetId
anywhere else in the code.
Search for refs to params.datasetId
under yo/app/ though it may not be set so directly. Indeed: no other references found.
In tc-ids.service, datasetId
is a parameter to this.upload()
- but that's fed from upload.controller, so doesn't explain/help.
Search: 12 refs to datasetId
under yo/app/, none of which appear to be setting $state.params
.
Could it be that it never happens? Or is the param name 'datasetId' being constructed in the code (e.g. as entityType + 'Id'
)?
Search for $state.params
: there are many hits in the controllers, but most are not relevant.
browse-entities.controller looks for $state.params[entityType + 'Id']
, but never sets it.
One hit in the services: tc-icat-entity.service : this.stateParams()
clones $state.params
, sets the value of entityType + 'Id'
to this.id
in the clone, then returns the clone.
So for a dataset entity, this would set datasetId
.
This looks suspiciously like what we're looking for, though we're still missing the point where $state.params
is set to this.
So look for references to stateParams
: 18 matches in yo/app/. Most are not relevant (they are reading $stateParams
, not setting them); I could have narrowed the search by looking for .stateParams
or .stateParams(
, though both risk missing meaningful uses).
In the end, the most promising-looking match is back in tc-icat-entity.service, where this.browse()
calls this.stateParams()
then does some sort of state hierarchy
pushing and finally does $state.go(state, params, goOptions)
which I think does what I expected.
Note that this.stateParams()
builds its output map using this.thisAndAncestors()
; the result is that the map contains entityId values for not just the current entity but also all of its ancestor entities. So, for a Dataset, $state.params
will include datasetId
, investigationId
and (via a special case) proposalId
.
(Actually, it's somewhat more complicated than this. If the current state name starts with home.browse.facility.
(which matches the state name constructed by this.browse()
) then this.stateParams()
doesn't use
thisAndAncestors
but clones the current $state.params
then adds (or changes) the current entityTypeId
. I assume that if Upload needs the
investigationId
and the state name matches the pattern, then it is already present in $state.params
. If the state doesn't match the pattern, then
I assume that investigationId
will be added via thisAndAncestors()
.)
I conclude - eventually! - that $state.params.datasetId
is set (only) when the user is browsing (the datafiles within) a dataset; and so in this case, the Upload button, if present,
will allow the user to add files to the current dataset.
As the enableUpload property can only be set for browse.dataset and browse.datafile, this means that the datasetId can only be set when the user is browsing within a particular dataset, i.e. the datafiles within it.
(End of digression).
If the upload completes successfully, then Topcat is refreshed (I presume this forces the current browse view to update from source and so pick up the new dataset or datafiles), 'upload:complete'
is broadcast, and
the upload dialog is dismissed.
ids.upload(this.datasetId, this.files).then(function(datafileIds){
tc.refresh();
$rootScope.$broadcast('upload:complete', datafileIds);
$uibModalInstance.dismiss('cancel');
}, handleError);
If uploading fails, then the isUploading
flag is cleared and the user is informed.
The isUploading flag is used to disable the upload button while an upload is in progress. The flag is not
explicitly cleared in (success case in) the upload() method itself, so I presume that somewhere else clears it in response to
the broadcast.
However, a search for 'upload:complete'
fails to find any other references under /yo/app; so it is not clear what, if anything, responds to the broadcast.
So what happens to the isUploading flag when the upload does not fail? Nothing, it would seem. However, when uploading
completes, the Upload modal dialog is dismissed, and the flag is no longer relevant (as it is mainly used to guard buttons in the dialog). The flag is initialised to false in the controller startup, which (I think) will happen each
time a new Upload dialog is created.
This binds to the upload-area.html and defines a number of (mouse) events that (I think!) implement a drag-and-drop interface for the inclusion and removal of files.
Defines/declares the properties in topcat.json that control the Upload behaviour, as described in Configuration above.
The config() function sets properties in the facility configuration for the idsUploadDatafileFormatId
and idsUploadDatasetTypeId
from their values in the current session; these in turn come ultimately from the idsUploadDatafileFormat
and idsUploadDatasetType
names for the facility that are defined in topcat.json (see tc-icat.service below).
The login() function tests that if the idsUploadDatafileFormat
or idsUploadDatasetType
are defined in the
configuration for the current facility, then they correspond to known entities in ICAT, and sets the corresponding IDs in the session
state. (It doesn't appear to insist that if one is defined, the other must be defined too.) If either fails,
then a message is shown in the browser console; but the effect to the user will probably be either a
"login failed" alert, or a blank page.
This defines an upload() function that takes a DatasetId and a list of files and performs some gobbledegook
to upload the files in chunks using a ws://
or wss://
url based on the Topcat URL. The URL construction is:
if(topcatUrl.match(/^https:\/\//)){
topcatUrl = "wss://" + topcatUrl.replace(/^https:\/\//, '');
} else {
topcatUrl = "ws://" + topcatUrl.replace(/^http:\/\//, '');
}
var currentUrl = topcatUrl + "/topcat/ws/user/upload?" + ...;
where ...
are the url-encoded parameters; these include:
facilityName: facility.config().name,
sessionId: facility.icat().session().sessionId,
datasetId: datasetId,
datafileFormatId: facility.config().idsUploadDatafileFormatId,
name: file.name,
contentLength: file.size
The code appears to generate a separate request for each file; but the iteration over the list of files is a somewhat convoluted combination of recursion and/or connection callback functions and list-shifting. (It doesn't help that the inner function that does the real work is also called upload()!)
During the upload, the code updates the percentage progress of
each file, which is displayed in the upload-area directive view. (See the definition of the readChunk()
function,
and the reader.onload()
function defined within it.)
This class provides the /topcat/ws/user/upload
endpoint that is used in tc-ids.service's upload() function.
The class declaration uses the following annotations:
@ApplicationScoped
@ServerEndpoint("/topcat/ws/user/upload")
public class IdsUploadProxy {
and the main methods are annotated with @OnOpen
, @OnClose
, @OnError
and @OnMessage
, which I presume mean something in the world of server endpoints.
The class maintains a map from Session
to (single instances of) an internal class Upload
(i.e. at any one time there is at most one Upload per Session).
The @OnOpen
method adds a new Upload instance to the map; the @OnClose
method removes it.
The Upload class constructor takes a Session, from which it extracts the parameters (supplied in the /topcat/ws/user/upload
URL, see above), and uses these to construct an /ids/put request (that is sent to the IDS associated with
the facility (name) in topcat.properties). The @OnMessage
method appears to be what triggers the actual writing of file data to the IDS (by calling Upload.write()
).
I presume that the IDS will use its configured storage mechanism to associate the data in the request body with the filename supplied in the parameters.
This is only relevant if uploading is enabled in Topcat.
Do any facilities use separate IDS instances for storage and retrieval, possibly for performance reasons? (I think that DLS use a separate IDS for ingestion, but do not allow uploads through Topcat.)
IdsUploadProxy is coded to use the IDS url that is associated with the facility name in topcat.properties. If the same configuration property is used elsewhere in the server side for data retrieval (I have not checked), then such a separation will not be possible on the server side.
Note however that the client (Javascript) side obtains its IDS urls from topcat.json (and can be configured with different IDS urls for each download type); so it is possible for the client and server side to use different IDS urls. The client-side IDS urls will be used solely for retrieval; only the server-side URL might be used for storage.