Skip to content

Latest commit

 

History

History
116 lines (86 loc) · 6.86 KB

ParticipatingSiteOverview.md

File metadata and controls

116 lines (86 loc) · 6.86 KB

Introduction

In order to provide a consistent and reproducible computing environment where the 4CE Phase 2.1 analysis code will be run, we have built a Docker container image. This image comes pre-configured with all of the prerequisite software and libraries required to install and run the R packages that implement the federated Phase 2.1 analytic workflows.

Briefly, Docker allows the creation of an isolated user space where applications can be run, and that user space can be pre-configured with specific versions of the software required to support the application. This allows us to deliver a relatively light-weight solution to the "dependency / DLL hell" problem, wherein a single software package might be delivered to multiple sites, run on the same data, but produce different results due to inconsistencies with how the supporting environemnt is configured. By creating a single conatiner image that will be used at all sites to run the analytic code, we can assure that e.g. all sites are running the same version of the R packages that are required to implement the work flows.

An in-depth discussion of container technologies is beyond the scope of this document. For background reading, please see:

https://en.wikipedia.org/wiki/OS-level_virtualization

https://www.docker.com/resources/what-container

https://docs.docker.com/get-started/overview/

The rest of this document will walk through the steps required for a site to participate in 4CE Phase 2.1 analyses, and provide pointers to additional documentation where appropriate.

Overview of the process

The process for a site to participate is as follows:

  1. Run the Data Extraction Code to Produce Site Data Files
  2. Install and Run the 4CE Analysis Docker Image
  3. Copy Site Data Files to the Directory Shared with the Container
  4. Run QC on the Site Data Files in an R Session in the Container
  5. Run 4CE Analytic Packages on the Site Data in an R Session in the Continer

1. Run the Data Extraction Code to Produce Site Data Files

Each site must first run the 4CE Phase 2.1 extraction routine on their local clinical data warehouse. This will produce several files that serve as input to the 4CE analysis tools. The analyses will be run on this data, but only summary results will be transferred back to the coordinating site -- no patient-level data will ever be shared.

2. Install and Run the 4CE Analysis Docker Image

All of the 4CE analyses are being written in R, and in order to ensure that the correct versions of supporting software libraries are being used across all particiapting sites, we have created a Docker image that will serve as the analytic workbench. Please follow the instructions here to pull the Docker image, and connect to it via SSH or via R Studio Server.

3. Copy Site Data Files to the Directory Shared with the Container

All of the files generated by the 4CE Phase 2.1 extraction routine need to be made available to the container at runtime so that the analysis packages can read them. The instructions here describe how you should bind a local directory on the Docker host to the /4ceData in the container. Please copy the files generated by the 4CE Phase 2.1 extraction routine to an Input subdirectory within this directory on the Docker host.

Naming Conventions

The files generated by the extraction routine need to be named according to the following convention:

Labs-[siteid].csv
Medications-[siteid].csv
Diagnoses-[siteid].csv
Demographics-[siteid].csv
DailyCounts-[siteid].csv
ClinicalCourse-[siteid].csv

Where siteid is the name of the participating clinical site.

Confirm File Availability in Container

You can confirm that the files are available in the container by following the instructions to run an R session in the contianer described here, and in the R session (command line or R Studio Server), run the following:

list.files("/4ceData/Input")

You should see the names of the files that you copied listed in the output of that function call.

4. Run QC on the Site Data Files in an R Session in the Container

We have created a package to help validate the site data before any analyses are run on it. Each analytic package will automatically call this routine to ensure data integrity before proceeding with any analyses. You should first run it explicitly to ensure that your data is of sufficient quality to proceed. Please see here for instructions.

5. Run 4CE Analytic Packages on the Site Data in an R Session in the Continer

More details coming soon.

Each of the 4CE analysis packages will first be vetted by the steering committe and once approved, will be made available for download to the 4CE participating sites. To get a sense of how this will work, please see the developer documentation here and here.
All of the packages must implement the same API. The packages will be installed in an R session with a command similar to:

devtools::install_github("https://github.com/covidclinical/Phase2.1[PACKAGE_NAME]RPackage", subdir="FourCePhase2.1[PACKAGE_NAME]", upgrade=FALSE)

The analysis that is implemented in the package can then be run in an R session via:

FourCePhase2.1[PACKAGE_NAME]::runAnalysis()

The package developer may optionally implement an additional function in their R package to allow the sites to validate the results of the analysis with:

FourCePhase2.1[PACKAGE_NAME]::validateAnalysis()

Finally, the results of the analysis will be automatically pushed to a GitHub repository managed by the coordinating site by running the following command in an R session:

FourCePhase2.1[PACKAGE_NAME]::submitAnalysis()

In addition to giving participating sites the ability to selectively contribute to individual analysis project, we will be distributing at regular intervals pre-built Docker images that contain all of the up-to-date approved analysis packages, and wrapper code in R that will invoke each one of the approved packages without requiring user intervention.