diff --git a/.github/ISSUE_TEMPLATE/release-checklist.md b/.github/ISSUE_TEMPLATE/release-checklist.md index 10225817..d25b1897 100644 --- a/.github/ISSUE_TEMPLATE/release-checklist.md +++ b/.github/ISSUE_TEMPLATE/release-checklist.md @@ -15,6 +15,7 @@ assignees: '' ### Preparing for the release - [ ] Does this release require an update to the CHANGELOG because there will be changes to downloadable files? If so, please create an issue tracking the CHANGELOG update and mark it as blocking this issue. - [ ] Are all other issues planned for this release resolved? If any issues are unresolved, mark this issue as blocked by those on ZenHub. +- [ ] All `TODO`s and `STUB` links have been filled in with their correct values, or commented out. - [ ] Optional: If not all changes in `development` are ready to be released, create a feature branch off `main` and cherry pick commits from `development` to that feature branch. - [ ] File a PR from the `development` branch (or the new feature branch) to the `main` branch. This should include all of the changes that will be associated with the next release. - [ ] If a CHANGELOG entry was required, add the date to the entry's header as part of this PR. diff --git a/.github/workflows/spell-check.yml b/.github/workflows/spell-check.yml index 06fe6e5b..d21366e1 100644 --- a/.github/workflows/spell-check.yml +++ b/.github/workflows/spell-check.yml @@ -6,7 +6,6 @@ on: branches: - main - development - - "the-manuscript" # A workflow run is made up of one or more jobs that can run sequentially or in parallel jobs: diff --git a/components/dictionary.txt b/components/dictionary.txt index 477eef19..d2f6275e 100644 --- a/components/dictionary.txt +++ b/components/dictionary.txt @@ -105,6 +105,7 @@ submitter submitters Tian transcriptome +transcriptomic transcriptomics TSV UMAP diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 2818640a..2bb34185 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -5,13 +5,26 @@ As of November 2023, the `CHANGELOG` is a feature of our documentation we'll use You can find more information about how and when your download was prepared in the following places: * The date your download was packaged (`Generated on: {date}`) is included at the top of the README in your download. -* The version of the [`AlexsLemonade/scpca-nf`](https://github.com/alexsLemonade/scpca-nf) pipeline used to process data in your download is included in the `workflow_version` column of the `single_cell_metadata.tsv` or `bulk_metadata.tsv` file in your download. +* The version of the [`AlexsLemonade/scpca-nf`](https://github.com/alexsLemonade/scpca-nf) pipeline used to process data in your download is included in the `workflow_version` column of the `single_cell_metadata.tsv`, `bulk_metadata.tsv`, or `spatial_metadata.tsv` file in your download. For more information about `AlexsLemonade/scpca-nf` versions, please see [the releases page on GitHub](https://github.com/AlexsLemonade/scpca-nf/releases). +## 2026.01.08 + +* Data on the Portal can now be downloaded in three different ways: + * By selecting a {ref}`single project` to download. + * By creating a {ref}`custom dataset` with a selection of projects and/or samples. + Custom datasets are referred to as `My Dataset` on the Portal. + * By choosing one of the {ref}`Portal-wide download options`. +* Although the content of the data included in each download has not changed, the download file structures have changed. +See the {ref}`Downloadable Files page for more information`. +* A new section of the documentation describing possible {ref}`download options` has been added. +* The previously named `single_cell_metadata.tsv` files that are included with each download have been renamed to `single-cell_metadata.tsv`. + + ## 2025.12.04 All data on the Portal has been updated to include a number of new features. @@ -40,11 +53,11 @@ This has now been fixed so that all merged objects have the `cell_id` formatted ## 2025.04.24 -* Consensus cell type annotations are now available in all processed `SingleCellExperiment` and `AnnData` objects and merged objects. - * The labels obtained from `SingleR` and `CellAssign` are used to assign an ontology-aware consensus cell type label. - * See our {ref}`documentation on cell type annotation` to learn more about how consensus cell types are assigned. +* Consensus cell type annotations are now available in all processed `SingleCellExperiment` and `AnnData` objects and merged objects. + * The labels obtained from `SingleR` and `CellAssign` are used to assign an ontology-aware consensus cell type label. + * See our {ref}`documentation on cell type annotation` to learn more about how consensus cell types are assigned. * See {ref}`the single-cell gene expression file contents page` and {ref}`the merged object file contents page` for more information on obtaining these cell type annotations from the downloaded objects. -* All assays within a merged object are now saved as a sparse matrix (`CsparseMatrix`), whereas previously these assays were saved as a `DelayedArray`. +* All assays within a merged object are now saved as a sparse matrix (`CsparseMatrix`), whereas previously these assays were saved as a `DelayedArray`. ## 2024.11.14 diff --git a/docs/download_files.md b/docs/download_files.md index 20bb9ce8..2eaea806 100644 --- a/docs/download_files.md +++ b/docs/download_files.md @@ -1,12 +1,13 @@ # Downloadable files The ScPCA Portal download packages include gene expression data, a QC report, and associated metadata for each processed sample. -Gene expression data is available as either [`SingleCellExperiment` objects (`.rds` files)](#singlecellexperiment-downloads) or [`AnnData` objects (`.h5ad` files)](#anndata-downloads). -These files are delivered as a zip file. +Gene expression data is available as either [`SingleCellExperiment` objects (`.rds` files)](#download-folder-structure-for-singlecellexperiment-project-downloads) or [`AnnData` objects (`.h5ad` files)](#download-folder-structure-for-anndata-project-downloads). +All downloaded files are delivered as a zip file. When you uncompress the zip file, the root directory name of your download will include the date you accessed the data on the ScPCA Portal. We recommend you record this date in case there are future updates to the Portal that change the underlying data or if you need to cite the data in the future (see {ref}`How to Cite ` for more information). Please see our {ref}`CHANGELOG ` for a summary of changes that impact downloads from the Portal. +Data can be downloaded by either downloading a [single project](#project-downloads), creating a [custom dataset](#custom-datasets), or by choosing one of the [Portal-wide download options](#portal-wide-downloads). For all data downloads, sample folders (indicated by the `SCPCS` prefix) contain the files for all libraries (`SCPCL` prefix) derived from that biological sample. Most samples only have one library that has been sequenced. For [multiplexed sample libraries](#multiplexed-sample-libraries), the sample folder name will be an underscore-separated list of all samples found in the library files that the folder contains. @@ -21,45 +22,156 @@ The files shown below will be included with each library (example shown for a li - A quality control report: `SCPCL000000_qc.html`, - A supplemental cell type report: `SCPCL000000_celltype-report.html` -Every download also includes a single `single_cell_metadata.tsv` file containing metadata for all libraries included in the download. +For more information on the contents of these files, see the sections on [gene expression data](#gene-expression-data), the [QC report](#qc-report), and the [cell type report](#cell-type-report). -Metadata-only downloads are also available, either by downloading the metadata for all samples in a single project or by downloading the metadata for all samples on the Portal. -Please see the [section on metadata](#metadata) for a full description of the contents of the metadata files. +Every download also includes sample and processing metadata for all libraries included in the download. +For a full description of the metadata files, refer to the [metadata section below](#metadata). -If downloading a project containing bulk RNA-seq data, two tab-separated value files, e.g., `SCPCP000000_bulk_quant.tsv` and `SCPCP000000_bulk_metadata.tsv`, will be included in the merged object download. -The `SCPCP000000_bulk_quant.tsv` file contains a gene by sample matrix (each row a gene, each column a sample) containing raw gene expression counts quantified by Salmon. -The `SCPCP000000_bulk_metadata.tsv` file contains associated metadata for all samples with bulk RNA-seq data. +Metadata-only downloads are also available, either by downloading the metadata for all samples in a single project using the `Download Sample Metadata` button or by downloading the [metadata for all samples on the Portal](#metadata-only-downloads). -See also {ref}`processing bulk RNA samples `. +## Project downloads + +Use the `Download Now` button next to the project title to instantly download gene expression data for all samples in a single project as a single zip file. +To download more than one project or combine samples across projects, see the section on [downloading custom datasets](#custom-datasets). + +For project downloads, data for all samples will be provided as either [`SingleCellExperiment` objects (`.rds` files)](https://bioconductor.org/books/3.21/OSCA.intro/the-singlecellexperiment-class.html) or [`AnnData` objects (`.h5ad` files)](https://anndata.readthedocs.io/en/latest/index.html). +Each zip file will be named with the project accession ID, the chosen data format (either `single-cell-experiment` or `anndata`), and the date you accessed the data on the ScPCA Portal. + +- If the project contains bulk RNA-seq data, a separate folder labeled with the `_bulk` suffix containing two tab-separated value files, `SCPCP000000_bulk_quant.tsv` and `SCPCP000000_bulk_metadata.tsv`, will also be included in the project download. +See the [section on bulk RNA-seq for more information](#bulk-rna-seq). +- If the project contains samples with a spatial transcriptomics library, the spatial data will be provided as a separate download. +See the expected file structure and [description of the Spatial transcriptomics output below](#spatial-transcriptomics-libraries). +- If the project contains samples that have been multiplexed, the organization of the downloaded files will be slightly different than what is shown below. +See the section describing [multiplexed sample libraries](#multiplexed-sample-libraries) for an overview of the expected download structure. + +For more information on choosing a data format and modality, see the {ref}`documentation on download options`. + +When downloading a project, you can choose to download data from all samples as individual files, or you can download {ref}`a single file containing all samples merged into a single object`. +Below are examples of the expected folder structure when downloading a project with gene expression data from all samples stored in individual files. + +### Download folder structure for `SingleCellExperiment` project downloads: +![project download folder](images/project-sc.png){width="600"} + +### Download folder structure for `AnnData` project downloads: +![project download folder](images/project-anndata.png){width="600"} + +### Download folder structure for `AnnData` project downloads with CITE-seq (ADT) data: +![project download folder](images/project-anndata-cite-seq.png){width="600"} + +If downloading a project with samples that contain a CITE-seq library as an `AnnData` object (`.h5ad` file), the quantified CITE-seq expression data is included as a separate file with the suffix `_adt.h5ad`. + +### Merged object downloads + +Merged object downloads contain all single-cell or single-nuclei gene expression data for a given ScPCA project within a single object, provided as either a [`SingleCellExperiment` object (`.rds` file)](https://bioconductor.org/books/3.21/OSCA.intro/the-singlecellexperiment-class.html) or an [`AnnData` object (`.h5ad` file)](https://anndata.readthedocs.io/en/latest/index.html). + +The object file, `SCPCP000000_merged.rds` or `SCPCP000000_merged_rna.h5ad`, contains both a raw and normalized counts matrix, each with combined counts for all samples in an ScPCA project. +In addition to the counts matrices, the `SingleCellExperiment` or `AnnData` object stored in the file includes the results of library-weighted dimensionality reduction using both principal component analysis (PCA) and UMAP. +See the {ref}`section on merged object processing` for more information about how merged objects were created. + +If downloading a project that contains at least one CITE-seq library, the quantified CITE-seq expression data will also be merged. +In `SingleCellExperiment` objects (`.rds` files), the CITE-seq expression data is provided as an alternative experiment in the same object as the gene expression data. +However, for `AnnData` objects, (`.h5ad` files), the quantified CITE-seq expression is instead provided as a separate file called `SCPCP000000_merged_adt.h5ad`. + +For any projects containing bulk RNA-seq data, a separate folder `SCPCP000000_bulk` containing two tab-separated value files, `SCPCP000000_bulk_quant.tsv` and `SCPCP000000_bulk_metadata.tsv`, will also be included in the project download. +See the [section on bulk RNA-seq for more information](#bulk-rna-seq). + +Every download also includes a single `single-cell_metadata.tsv` file containing metadata for all libraries included in the merged object. +For a full description of this file's contents, refer to the [metadata section below](#metadata). + +Every download includes a summary report, `SCPCP000000_merged-summary-report.html`, which provides a brief summary of the samples and libraries included in the merged object. +This includes a summary of the types of libraries (e.g., single-cell, single-nuclei, with CITE-seq) and sample diagnoses included in the object, as well as UMAP visualizations highlighting each library. + +Every download also includes the individual [QC report](#qc-report) and, if applicable, [cell type annotation reports](#cell-type-report) for each library included in the merged object. + +#### Download folder structure for `SingleCellExperiment` merged downloads: +![merged project download folder](images/project-sc-merged.png){width="600"} + +#### Download folder structure for `AnnData` merged downloads: +![merged project download folder](images/project-anndata-merged.png){width="600"} + +#### Download folder structure for `AnnData` merged downloads with CITE-seq (ADT) data: +![merged project download folder](images/project-anndata-cite-seq-merged.png){width="600"} + + +## Custom datasets + +You can create a custom dataset with any combination of individual samples and projects with your choice of modalities and data format. +Custom datasets are referred to as `My Dataset` within the portal. +The `Add to Dataset` button allows you to add projects and selected samples to `My Dataset`. +You can select the data formats and modalities for each project or sample before you add it to `My Dataset`. +The `My Dataset` button on the top right of the portal can then be used to view and download the custom dataset as a single zip file. + +Each zip file will be named with a unique dataset ID, the chosen data format (either `single-cell-experiment` or `anndata`), and the date you accessed the data on the ScPCA Portal. +Note that a custom dataset can only contain single-cell data in one data format, [`SingleCellExperiment` objects (`.rds` files)](#custom-datasets-with-singlecellexperiment-format) or [`AnnData` objects (`.h5ad` files)](#custom-datasets-with-anndata-format) (see {ref}`FAQ for more information`). +If a sample has [spatial transcriptomics data](#spatial-transcriptomics-libraries), you can check the {ref}`Spatial modality box` to include the spatial transcriptomics data in `My Dataset`. + +Data for all samples included in `My Dataset` will be organized in folders labeled with the unique project identifier and modality, where each folder contains data for all samples from a single project with the same modality (either `single-cell`, `spatial`, or `bulk`). +Each project folder will also contain an appropriate metadata file, either `single-cell_metadata.tsv` (`single-cell`), `spatial_metadata.tsv` (`spatial`), or `SCPCP000000_bulk_metadata.tsv` (`bulk`). +For more information on available data formats and modalities, see {ref}`the section describing download options `. + +If downloading a project as a [merged object](#merged-object-downloads), the project folder will contain a `_merged` suffix. + +If any samples included in `My Dataset` contain associated CITE-seq data, the quantified CITE-seq expression data will be included when downloading single-cell expression data. +For `SingleCellExperiment` objects (`.rds` files), the quantified CITE-seq expression is included in the same file as the gene expression data. +For [`AnnData` objects (`.h5ad` files)](#detailed-folder-structure-for-individual-samples-with-cite-seq-adt-data), the quantified CITE-seq expression data is included as a separate file with the suffix `_adt.h5ad`. + +The below image shows the expected file structure for an example custom dataset. +For more details about the project folder contents for each data format and modality, see the [Project downloads section](#project-downloads). + +### Download folder structure for custom downloads: +![custom dataset download](images/custom-dataset-generic.png){width="600"} + +## Portal-wide Downloads + +The Portal-wide Download page can be used to download all [metadata](#metadata-only-downloads) or gene expression data for all samples on the Portal at once. -The folder structure within the zip file is determined by whether individual samples or all samples associated with a project are selected for download. -Note that if a sample selected for download contains a spatial transcriptomics library, the files included will be different than pictured below. -See the [description of the Spatial transcriptomics output section below](#spatial-transcriptomics-libraries). +All single-cell and single-nuclei gene expression data from the Portal can be downloaded as a single zip file containing data stored as either [`SingleCellExperiment` objects (`.rds` files)](#singlecellexperiment-portal-wide-download-structure) or [`AnnData` objects (`.h5ad` files)](#anndata-portal-wide-download-structure). +This zip file includes data for any [multiplexed samples](#multiplexed-sample-libraries). +All spatial data for any samples sequenced using [spatial transcriptomics](#spatial-transcriptomics-libraries) are available separately as a zip file. -## `SingleCellExperiment` downloads +When downloading any of the available Portal-wide data downloads, all relevant metadata and bulk RNA-seq data is also included. -### Download folder structure for project downloads: -![project download folder](images/project-download-folder.png){width="600"} +Each zip file will be named with the chosen data format (`single-cell-experiment`, `anndata`, or `spaceranger`) and the date you accessed the data on the ScPCA Portal. +Each zip file will contain a folder for each project with gene expression data for all samples in that project as either individual objects or a single [merged object](#portal-wide-downloads-as-merged-objects), depending on your selection. -### Download folder structure for individual sample downloads: -![sample download folder](images/sample-download-folder.png){width="600"} +For any projects containing bulk RNA-seq data, a separate folder `SCPCP000000_bulk` containing two tab-separated value files, `SCPCP000000_bulk_quant.tsv` and `SCPCP000000_bulk_metadata.tsv`, will also be included. +See the [section on bulk RNA-seq for more information](#bulk-rna-seq). -## `AnnData` downloads +As with [individual project](#project-downloads) and [custom datasets](#custom-datasets), the quantified CITE-seq expression data will be included when downloading single-cell expression data. +For [`SingleCellExperiment (R)` downloads](#singlecellexperiment-portal-wide-download-structure), the quantified CITE-seq expression is included in the same file as the gene expression data. +For [`AnnData (Python)` downloads](#anndata-portal-wide-download-structure), the quantified CITE-seq expression data is included as a separate file with the suffix `_adt.h5ad`. -### Download folder structure for project downloads: -![project download folder](images/anndata-project-download-folder.png){width="600"} +### `SingleCellExperiment` Portal-wide download structure +![portal wide download structure - `sce`](images/portal-wide-sc-folder.png){width="600"} -### Download folder structure for individual sample downloads: -![sample download folder](images/anndata-sample-download-folder.png){width="600"} +### `AnnData` Portal-wide download structure +![portal wide download structure - `anndata`](images/portal-wide-anndata-cite-seq.png){width="600"} -### Download folder structure for individual sample downloads with CITE-seq (ADT) data: -![sample download folder](images/anndata-sample-citeseq-download-folder.png){width="600"} +### Spatial Portal-wide download structure +![portal wide download structure - spatial](images/portal-wide-spatial.png){width="600"} -If downloading a sample that contains a CITE-seq library as an `AnnData` object (`.h5ad` file), the quantified CITE-seq expression data is included as a separate file with the suffix `_adt.h5ad`. +### Portal-wide downloads as merged objects + +You can choose to download all single-cell and single-nuclei samples from the Portal as [merged objects for each project](#merged-object-downloads) by checking "Merge samples into one object per project". +{ref}`Merged objects` contain gene expression for all samples in a given project in a single file. +This download includes a folder for each project that contains a single merged object (`SCPCP000000_merged.rds` or `SCPCP000000_merged.h5ad`), a merged summary report (`SCPCP000000_merged-summary-report.html`), a single [metadata](#metadata) file (`single-cell_metadata.tsv`), and all individual [QC reports](#qc-report) and, if applicable, [cell type annotation reports](#cell-type-report) for each library included in the merged object for that project. + +Note that downloading all data using this option _will not_ download a merged object with all samples from all projects, but a single merged object for each project. + +#### Portal-wide download structure for merged `SingleCellExperiment` objects +![portal wide download structure - merged `sce`](images/portal-wide-sc-merged.png){width="600"} + +#### Portal-wide download structure for merged `AnnData` objects +![portal wide download structure - merged `anndata`](images/portal-wide-anndata-cite-seq-merged.png){width="600"} + +### Metadata-only downloads + +The Portal-wide metadata download is a single TSV file containing the metadata for all samples with associated single-cell RNA-seq, single-nuclei RNA-seq, or spatial transcriptomics data available on the Portal. +A table describing all columns included in the file can be found in the [metadata section below](#metadata). ## Gene expression data -Single-cell or single-nuclei gene expression data is provided as either [`SingleCellExperiment` objects (`.rds` files)](http://bioconductor.org/books/3.13/OSCA.intro/the-singlecellexperiment-class.html) or [`AnnData` objects (`.h5ad` files)](https://anndata.readthedocs.io/en/latest/index.html). +Single-cell or single-nuclei gene expression data is provided as either [`SingleCellExperiment` objects (`.rds` files)](https://bioconductor.org/books/3.21/OSCA.intro/the-singlecellexperiment-class.html) or [`AnnData` objects (`.h5ad` files)](https://anndata.readthedocs.io/en/latest/index.html). Three files will be provided for each library included in the download - an unfiltered counts file, a filtered counts file, and a processed counts file. The unfiltered counts file, `SCPCL000000_unfiltered.rds` or `SCPCL000000_unfiltered_rna.h5ad`, contains the counts matrix, where the rows correspond to genes or features and the columns correspond to cell barcodes. @@ -93,7 +205,7 @@ Therefore, there will be no cell type report in the download for these libraries ## Metadata -Included with each download is a `single_cell_metadata.tsv` file containing relevant metadata for each sample included in the download. +Included with each download is a `single-cell_metadata.tsv` file containing relevant metadata for each sample included in the download. Each row corresponds to a unique sample/library combination and contains the following columns: | column_id | contents | @@ -147,84 +259,44 @@ Each row corresponds to a unique sample/library combination and contains the fol | `demux_method` | Methods used to calculate demultiplexed sample numbers. Only present for multiplexed libraries | | `demux_samples` | Samples included in multiplexed library. Only present for multiplexed libraries | | `date_processed` | Date sample was processed through `AlexsLemonade/scpca-nf` | +| `workflow` | The URL to the `AlexsLemonade/scpca-nf` workflow | +| `workflow_version` | Version of `AlexsLemonade/scpca-nf` the sample was processed with | +| `workflow_commit` | Commit hash of `AlexsLemonade/scpca-nf` the sample was processed with | -Additional metadata may also be included, specific to the disease type and experimental design of the project. -Examples of this include treatment or outcome. +Project-specific metadata will contain all columns listed in the table above and any additional project-specific columns, such as treatment or outcome. Metadata pertaining to processing will be available in this table and inside of the `SingleCellExperiment` and `AnnData` objects. See the {ref}`SingleCellExperiment experiment metadata ` section for more information on metadata columns that can be found in the `SingleCellExperiment` object. See the {ref}`AnnData experiment metadata ` section for more information on metadata columns that can be found in the `AnnData` object. For projects with bulk RNA-seq data, a bulk metadata file (e.g., `SCPCP000000_bulk_metadata.tsv`) will be included for project downloads. -This file will contain fields equivalent to those found in the `single_cell_metadata.tsv` related to processing the sample, but will not contain patient or disease specific metadata (e.g. `age`, `sex`, `diagnosis`, `subdiagnosis`, `tissue_location`, or `disease_timing`). - -### Metadata-only downloads - -Metadata for all samples on the Portal is available to download separately from gene expression data downloads. -Each project page has an option to download metadata for all of its samples as a single zip file containing the `metadata.tsv` file and a `README.md` file. -Project-specific metadata will contain all columns listed in [the above table](#metadata) and any additional project-specific columns, such as treatment or outcome. - -Additionally, a single TSV file containing the metadata for all samples from all projects on the Portal is available for download. -The Portal-wide metadata will contain all columns listed in [the above table](#metadata). +This file will contain fields equivalent to those found in the `single-cell_metadata.tsv` related to processing the sample, but will not contain patient or disease specific metadata (e.g. `age`, `sex`, `diagnosis`, `subdiagnosis`, `tissue_location`, or `disease_timing`). ## Multiplexed sample libraries For libraries where multiple biological samples were combined via cellhashing or similar technology (see the {ref}`FAQ section about multiplexed samples `), the organization of the downloaded files and metadata is slightly different. Note that multiplexed sample libraries are only available as `SingleCellExperiment` objects, and are not currently available as `AnnData` objects. -For project downloads, the counts and QC files will be organized by the _set_ of samples that comprise each library, rather than in individual sample folders. +When downloading an entire project, the counts and QC files will be organized by the _set_ of samples that comprise each library, rather than in individual sample folders. These sample set folders are named with an underscore-separated list of the sample ids for the libraries within, _e.g._, `SCPCS999990_SCPCS999991_SCPCS999992`. -Bulk RNA-seq data, if present, will follow the [same format as bulk RNA-seq for single-sample libraries](#download-folder-structure-for-project-downloads). +Bulk RNA-seq data, if present, will follow the [same format as bulk RNA-seq for single-sample libraries](#download-folder-structure-for-singlecellexperiment-project-downloads). -![multiplexed project download folder](images/multiplexed-download-folder.png){width="750"} +![multiplexed project download folder](images/project-sc-multiplexed.png){width="750"} Because we do not perform demultiplexing to separate cells from multiplexed libraries into sample-specific count matrices, sample downloads from a project with multiplexed data will include all libraries that contain the sample of interest, but these libraries _will still contain cells from other samples_. For more on the specific contents of multiplexed library `SingleCellExperiment` objects, see the {ref}`Additional SingleCellExperiment components for multiplexed libraries ` section. -The [metadata file](#metadata) for multiplexed libraries (`single_cell_metadata.tsv`) will have the same format as for individual samples, but each row will represent a particular sample/library pair, meaning that there may be multiple rows for each `scpca_library_id`, one for each `scpca_sample_id` within that library. +The [metadata file](#metadata) for multiplexed libraries (`single-cell_metadata.tsv`) will have the same format as for individual samples, but each row will represent a particular sample/library pair, meaning that there may be multiple rows for each `scpca_library_id`, one for each `scpca_sample_id` within that library. In addition, the `demux_cell_count_estimate` column will contain an estimate of the number of cells from the sample in the library (after demultiplexing) in the sample/library pair. -## Merged object downloads - -When downloading a full ScPCA project, you can choose to download data from all samples as individual files, or you can download {ref}`a single file containing all samples merged into a single object`. - -Merged object downloads contain all single-cell or single-nuclei gene expression data for a given ScPCA project within a single object, provided as either a [`SingleCellExperiment` object (`.rds` file)](http://bioconductor.org/books/3.13/OSCA.intro/the-singlecellexperiment-class.html) or an [`AnnData` object (`.h5ad` file)](https://anndata.readthedocs.io/en/latest/index.html). - -The object file, `SCPCP000000_merged.rds` or `SCPCP000000_merged_rna.h5ad`, contains both a raw and normalized counts matrix, each with combined counts for all samples in an ScPCA project. -In addition to the counts matrices, the `SingleCellExperiment` or `AnnData` object stored in the file includes the results of library-weighted dimensionality reduction using both principal component analysis (PCA) and UMAP. -See the {ref}`section on merged object processing` for more information about how merged objects were created. - -If downloading a project that contains at least one CITE-seq library, the quantified CITE-seq expression data will also be merged. -In `SingleCellExperiment` objects (`rds` files), the CITE-seq expression data is provided as an alternative experiment in the same object as the gene expression data. -However, for `AnnData` objects, (`.h5ad` files), the quantified CITE-seq expression is instead provided as a separate file called `SCPCP000000_merged_adt.h5ad`. - -Every download also includes a single `single_cell_metadata.tsv` file containing metadata for all libraries included in the merged object. -For a full description of this file's contents, refer to the [metadata section above](#metadata). - -If downloading a project containing bulk RNA-seq data, two tab-separated value files, e.g., `SCPCP000000_bulk_quant.tsv` and `SCPCP000000_bulk_metadata.tsv`, will be included in the merged object download. -The `SCPCP000000_bulk_quant.tsv` file contains a gene by sample matrix (each row a gene, each column a sample) containing raw gene expression counts quantified by Salmon. -The `SCPCP000000_bulk_metadata.tsv` file contains associated metadata for all samples with bulk RNA-seq data. -This file will contain fields equivalent to those found in the `single_cell_metadata.tsv` related to processing the sample, but will not contain patient or disease specific metadata (e.g. `age`, `sex`, `diagnosis`, `subdiagnosis`, `tissue_location`, or `disease_timing`). - -Every download includes a summary report, `SCPCL000000_merged-summary-report.html`, which provides a brief summary of the samples and libraries included in the merged object. -This includes a summary of the types of libraries (e.g., single-cell, single-nuclei, with CITE-seq) and sample diagnoses included in the object, as well as UMAP visualizations highlighting each library. - -Every download also includes the individual [QC report](#qc-report) and, if applicable, [cell type annotation reports](#cell-type-report) for each library included in the merged object. - -### Download folder structure for `SingleCellExperiment` merged downloads: -![project download folder](images/merged-project-download-folder.png){width="600"} - -### Download folder structure for `AnnData` merged downloads: -![project download folder](images/merged-anndata-project-download-folder.png){width="600"} - -### Download folder structure for `AnnData` merged downloads with CITE-seq (ADT) data: -![project download folder](images/merged-anndata-project-citeseq-download-folder.png){width="600"} - - ## Spatial transcriptomics libraries -If a sample includes a library processed using spatial transcriptomics, the spatial transcriptomics output files will be available as a separate download from the single-cell/single-nuclei gene expression data. +If a sample includes a library processed using spatial transcriptomics, you can obtain the spatial transcriptomics output files by selecting `Spatial` as the modality (see more on {ref}`modality download options`). + +If downloading an [entire project using the `Download Now` button](#project-downloads), you will need to download the spatial data separately from the single-cell and single-nuclei gene expression data. +If creating and downloading a [custom dataset by using the `Add to Dataset` button](#custom-datasets), you will be able to select both `Single-cell` and `Spatial` to be included in the download. +Alternatively, you can download all of the spatial transcriptomic data from the Portal on the [Portal-wide Downloads page](#portal-wide-downloads). For all spatial transcriptomics libraries, a `SCPCL000000_spatial` folder will be nested inside the corresponding sample folder in the download. Inside that folder will be the following folders and files: @@ -239,7 +311,20 @@ A full description of all files included in the download for spatial transcripto Every download also includes a single `spatial_metadata.tsv` file containing metadata for all libraries included in the download. -![sample download with spatial](images/spatial-download-folder.png){width="600"} +![sample download with spatial](images/project-spatial.png){width="600"} + + +## Bulk RNA-seq + +Some projects include samples that were sequenced using bulk RNA-seq alongside any single-cell and/or single-nucleus RNA-seq. +For more details on including bulk RNA-seq in your download, see the {ref}`documentation on download options`. + +A separate folder labeled with the project accession ID and `_bulk` suffix will be included for each project containing bulk RNA-seq in the download. +This folder contains two tab-separated value files, `SCPCP000000_bulk_quant.tsv` and `SCPCP000000_bulk_metadata.tsv`. +The `SCPCP000000_bulk_quant.tsv` file contains a gene by sample matrix (each row a gene, each column a sample) containing raw gene expression counts quantified by `salmon`. +The `SCPCP000000_bulk_metadata.tsv` file contains associated metadata for all samples with bulk RNA-seq data. +This file will contain fields equivalent to those found in the `single-cell_metadata.tsv` related to processing the sample, but will not contain patient or disease specific metadata (e.g. `age`, `sex`, `diagnosis`, `subdiagnosis`, `tissue_location`, or `disease_timing`). +See also {ref}`processing bulk RNA samples `. ## Programmatic downloads from the ScPCA Portal diff --git a/docs/download_options.md b/docs/download_options.md new file mode 100644 index 00000000..30f2d98b --- /dev/null +++ b/docs/download_options.md @@ -0,0 +1,60 @@ +# Download options + +You can obtain Portal data either by downloading a single project, creating a custom dataset with a selection of projects and/or samples, or by choosing one of the Portal-wide download options. +For full information about the files you can download from the Portal, see {ref}`the Downloadable Files page`. + +This page describes the different options available to you when downloading Portal data. + +## Data format + +We provide all single-cell and single-nuclei expression data in both `SingleCellExperiment` objects (`.rds` files) for use in R, or as `AnnData` objects (`.h5ad` files) for use in Python. +The default format for all samples on the Portal with single-cell and single-nuclei expression is set to `SingleCellExperiment (R)`. +You can learn more about using these object types from our FAQ sections on {ref}`using the provided RDS files` and {ref}`using the provided H5AD files`. + +Only {ref}`one data format is currently supported for a single download`, including when {ref}`downloading custom datasets`. +To obtain data in both `SingleCellExperiment` and `AnnData` formats, you will need to download these file formats separately. + +The only exception to this rule is when downloading spatial transcriptomics data, as the only data format available will be `Spaceranger`. +Spatial data can be coupled with single-cell data in either the `SingleCellExperiment` or `AnnData` format, but not with both. + +In addition, note that expression data for multiplexed libraries is only available in `SingleCellExperiment` format, {ref}`as described here`. + +## Modalities + +Besides single-cell/nuclei expression, many samples in the Portal have additional sequencing modalities including CITE-seq, spatial transcriptomics, and bulk RNA-seq. + +In particular, there are two modality options that you may see when {ref}`creating a custom dataset to download` or when {ref}`downloading a full project with the "Download Now" button`: `Single-cell` and `Spatial`. + +By default, the `Single-cell` modality will be selected for all single-cell and single-nuclei RNA-seq samples and/or projects. +Selecting this download option will provide you with the gene expression data from single-cell or single-nuclei samples and/or projects. +If available, CITE-seq expression data will also be included. + +If a sample or project has spatial transcriptomic data, you will also have the option to select the `Spatial` modality for download. +Selecting `Spatial` will provide you with the spatial transcriptomic data only. + +If you are creating a custom dataset that contains samples and/or projects with bulk RNA-seq data, you will have the option to include this data in your download as well. +Note that the bulk RNA-seq expression file will always include all samples from the given project with bulk expression, even if you are only downloading a subset of that project's samples. +If you are using the "Download now" button to download a full project that contains bulk RNA-seq expression, it will automatically be included with the download. + +For more information about the expected file download structure for `Single-cell` and `Spatial` modalities, refer to our {ref}`Downloadable files`. + +## Merged objects + +When downloading a project, either by using `Download Now` or `Add to Dataset`, you will have the option to either receive the data as objects for individual libraries, or as {ref}`a single merged object with data from all samples in the given project`. +Please be aware that merged objects have _not_ been integrated or batch-corrected. +Refer to {ref}`this documentation` for the contents of a merged object download specifically. +Note that this applies only to `Single-cell` modality downloads, not `Spatial`. + +When {ref}`creating a custom dataset to download`, you will be able to select the option to merge all samples only if you have included all samples from the given project in `My Dataset`. +Merging a subset of samples in a project {ref}`is not currently supported`. +In addition, merged objects are not available for all samples or projects, {ref}`as described here `. + +Note that even when {ref}`downloading data for all single-cell and single-nuclei samples on the Portal`, merged objects will still be provided per-project. +There will not be a merged object with all samples from all projects, but a single merged object for each project. + +## Multiplexed sample libraries + +When downloading a project that contains multiplexed samples (see {ref}`What is a multiplexed sample? `), you will have the option to exclude multiplexed samples from the download. +If selected, the download will contain expression data for only non-multiplexed samples. +Note that, {ref}`as described in our FAQ`, `AnnData` objects (`.h5ad` files) are not available for multiplexed samples. +In addition, you will not be able to select the option to merge samples into a single file {ref}`if the project contains multiplexed samples`. diff --git a/docs/faq.md b/docs/faq.md index d73d7774..963ad859 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -170,7 +170,7 @@ For more information on where to find these results, refer to sections describin There are several circumstances when CNV results are not available: * CNV inference is not performed on libraries which do not have enough cells to include in a normal reference, as described in the {ref}`CNV inference processing documentation` -* CNV inference is not performed on libraries derived from cell line samples +* CNV inference is not performed on libraries derived from cell line or non-cancerous samples * If `inferCNV` experienced a failure while running, there will not be any associated results in the processed objects ## Where can I find the inferCNV heatmap? @@ -241,7 +241,7 @@ The samples have simply been merged into a single file - _they have not been int You may prefer to download this merged object instead of individual sample files to facilitate downstream analyses that consider multiple samples at once, such as differential expression analysis, integrating multiple samples, or jointly clustering multiple samples. -Please refer to {ref}`the getting started with a merged object section` for more details on working with merged objects. +Please refer to {ref}`the section about getting started with a merged object` for more details on working with these objects. ## Which projects can I download as merged objects? @@ -261,6 +261,14 @@ There are three types of projects for which merged objects are not available: - The more samples that are included in a merged object, the larger the object, and the more difficult it will be to work with that object in R or Python. Because of this, we do not provide merged objects for projects with more than 100 samples as the size of the merged object is too large. +## Why can't I merge a subset of samples from a project? + +{ref}`Merged project downloads` are not available for a subset of samples in a project. +Merged objects will always contain all samples in the given project ([see which projects do not have merged objects](#which-projects-can-i-download-as-merged-objects)). + +If you would like to work with a merged object that only contains a subset of project samples, we recommend downloading the merged object and subsetting it directly to the samples of your choosing. +See {ref}`Subsetting the Merged Object` for instructions on how to subset `SingleCellExperiment` and `AnnData` merged objects. + ## Why doesn't my existing code work on a new download from the Portal? Although we try to maintain backward compatibility, new features added to the ScPCA Portal may result in downloads that are no longer compatible with code written with older downloads from the ScPCA Portal in mind. @@ -283,8 +291,44 @@ Download links expire in 7 days, but you can generate a new link on the ScPCA Po Download links are only available for projects (i.e., not for downloading individual samples). ## Can I download data from the Portal programmatically? +## Why can't I change the data format in My Dataset? We provide an R package, [`ScPCAr`](https://alexslemonade.github.io/ScPCAr/), to facilitate programmatic access to the ScPCA Portal. This package allows you to search for and download data from the ScPCA Portal directly within R. Please see the [package documentation](https://alexslemonade.github.io/ScPCAr/) for more details about installation and usage. Source code for the package can be found on [GitHub](https://github.com/AlexsLemonade/ScPCAr). + +When creating a {ref}`custom dataset for download` (`My Dataset`), all single-cell sample or project data included must be of the same {ref}`data format`, either `SingleCellExperiment` for use in R or `AnnData` for use in Python. +We currently do not support including both data formats at once in `My Dataset`. +Once a sample or project of a given data format has been added to `My Dataset`, all subsequent single-cell or single-nuclei data added will automatically be in that same format. + +Therefore, if you wish to download single-cell or single-nuclei expression data in both `SingleCellExperiment` and `AnnData` data formats, you will need to create and download separate `My Dataset`s, one at a time, for each format. + + +## Why did project options change when I appended samples to My Dataset? + +If you would like to include all samples from a dataset you have previously created, you can append these samples to your current `My Dataset`. +In some cases, however, certain project-level options may change when you append additional samples from a project that is already present in `My Dataset`. + +Specifically, we apply these rules when you append to `My Dataset`: + +* If you selected to {ref}`include bulk RNA-seq expression in the download` *either* in the previous dataset or the current `My Dataset`, bulk expression will remain included in the download. +* If you selected the {ref}`merged project option` **both** in the previous dataset and the current `My Dataset`, the merge option will remain selected. +Otherwise, if only one dataset had this option selected, the merge option will no longer be applied. + +You are always welcome to edit these options in `My Dataset` to your liking after appending the additional samples. + +## Why are some values different after I regenerate My Dataset? + +The Portal only offers data processed using a single version of the [`AlexsLemonade/scpca-nf` workflow](github.com/AlexsLemonade/scpca-nf/) for each sample at any given time. +If any new features or updates are made to the workflow, all data currently on the Portal will be re-processed. +Because of this, when you regenerate a previously-created version of `My Dataset`, the values in your downloaded files may be slightly different compared to a previous download. +A full description of any major changes made to data on the Portal are described in the {ref}`CHANGELOG page`. + +You can learn more about the specific version of the data you have as follows: + +* The file name of each downloaded zip file, and the enclosed `README.md`, will include the date it was downloaded from the Portal +* The metadata included in your downloaded files will contain information about the `AlexsLemonade/scpca-nf` workflow version that was used to process the data + * For example, the column `workflow_version` in the metadata file included in your download (`single-cell_metadata.tsv`, `bulk_metadata.tsv`, and/or `spatial_metadata.tsv`) provides the `AlexsLemonade/scpca-nf` workflow version used to process the sample, and the column `processed_date` provides the date the sample was processed through the workflow. + See the {ref}`metadata documentation` for additional information +* For more information about a given `AlexsLemonade/scpca-nf` release, refer to the [releases page on GitHub](https://github.com/AlexsLemonade/scpca-nf/releases) diff --git a/docs/getting_started.md b/docs/getting_started.md index 496cb053..2f4095a0 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -397,7 +397,7 @@ libraries <- c("SCPCL00000X", "SCPCL00000Y", "SCPCL00000Z") subsetted_merged_sce <- merged_sce[,merged_sce$library_id %in% libraries] ``` -To subset an `AnnData` merged object to a given set of libraries, use the following R code: +To subset an `AnnData` merged object to a given set of libraries, use the following python code: ```r # Define list of library IDs of interest diff --git a/docs/images/anndata-project-download-folder.png b/docs/images/anndata-project-download-folder.png deleted file mode 100644 index 7e92c492..00000000 Binary files a/docs/images/anndata-project-download-folder.png and /dev/null differ diff --git a/docs/images/anndata-sample-citeseq-download-folder.png b/docs/images/anndata-sample-citeseq-download-folder.png deleted file mode 100644 index ca8f54e9..00000000 Binary files a/docs/images/anndata-sample-citeseq-download-folder.png and /dev/null differ diff --git a/docs/images/anndata-sample-download-folder.png b/docs/images/anndata-sample-download-folder.png deleted file mode 100644 index 01b4e8e1..00000000 Binary files a/docs/images/anndata-sample-download-folder.png and /dev/null differ diff --git a/docs/images/custom-dataset-generic.png b/docs/images/custom-dataset-generic.png new file mode 100644 index 00000000..c9086f6b Binary files /dev/null and b/docs/images/custom-dataset-generic.png differ diff --git a/docs/images/merged-anndata-project-citeseq-download-folder.png b/docs/images/merged-anndata-project-citeseq-download-folder.png deleted file mode 100644 index f209b128..00000000 Binary files a/docs/images/merged-anndata-project-citeseq-download-folder.png and /dev/null differ diff --git a/docs/images/merged-anndata-project-download-folder.png b/docs/images/merged-anndata-project-download-folder.png deleted file mode 100644 index 3486d759..00000000 Binary files a/docs/images/merged-anndata-project-download-folder.png and /dev/null differ diff --git a/docs/images/merged-project-download-folder.png b/docs/images/merged-project-download-folder.png deleted file mode 100644 index cac77442..00000000 Binary files a/docs/images/merged-project-download-folder.png and /dev/null differ diff --git a/docs/images/multiplexed-download-folder.png b/docs/images/multiplexed-download-folder.png deleted file mode 100644 index 6180f48b..00000000 Binary files a/docs/images/multiplexed-download-folder.png and /dev/null differ diff --git a/docs/images/portal-wide-anndata-cite-seq-merged.png b/docs/images/portal-wide-anndata-cite-seq-merged.png new file mode 100644 index 00000000..add9b0fe Binary files /dev/null and b/docs/images/portal-wide-anndata-cite-seq-merged.png differ diff --git a/docs/images/portal-wide-anndata-cite-seq.png b/docs/images/portal-wide-anndata-cite-seq.png new file mode 100644 index 00000000..145d43c2 Binary files /dev/null and b/docs/images/portal-wide-anndata-cite-seq.png differ diff --git a/docs/images/portal-wide-sc-folder.png b/docs/images/portal-wide-sc-folder.png new file mode 100644 index 00000000..1f300fa6 Binary files /dev/null and b/docs/images/portal-wide-sc-folder.png differ diff --git a/docs/images/portal-wide-sc-merged.png b/docs/images/portal-wide-sc-merged.png new file mode 100644 index 00000000..151664f1 Binary files /dev/null and b/docs/images/portal-wide-sc-merged.png differ diff --git a/docs/images/portal-wide-spatial.png b/docs/images/portal-wide-spatial.png new file mode 100644 index 00000000..00830315 Binary files /dev/null and b/docs/images/portal-wide-spatial.png differ diff --git a/docs/images/project-anndata-cite-seq-merged.png b/docs/images/project-anndata-cite-seq-merged.png new file mode 100644 index 00000000..168af7a1 Binary files /dev/null and b/docs/images/project-anndata-cite-seq-merged.png differ diff --git a/docs/images/project-anndata-cite-seq.png b/docs/images/project-anndata-cite-seq.png new file mode 100644 index 00000000..51a7f880 Binary files /dev/null and b/docs/images/project-anndata-cite-seq.png differ diff --git a/docs/images/project-anndata-merged.png b/docs/images/project-anndata-merged.png new file mode 100644 index 00000000..2df89467 Binary files /dev/null and b/docs/images/project-anndata-merged.png differ diff --git a/docs/images/project-anndata.png b/docs/images/project-anndata.png new file mode 100644 index 00000000..ef4aa08a Binary files /dev/null and b/docs/images/project-anndata.png differ diff --git a/docs/images/project-download-folder.png b/docs/images/project-download-folder.png deleted file mode 100644 index 968d9562..00000000 Binary files a/docs/images/project-download-folder.png and /dev/null differ diff --git a/docs/images/project-sc-merged.png b/docs/images/project-sc-merged.png new file mode 100644 index 00000000..27e1e92e Binary files /dev/null and b/docs/images/project-sc-merged.png differ diff --git a/docs/images/project-sc-multiplexed.png b/docs/images/project-sc-multiplexed.png new file mode 100644 index 00000000..b24b2129 Binary files /dev/null and b/docs/images/project-sc-multiplexed.png differ diff --git a/docs/images/project-sc.png b/docs/images/project-sc.png new file mode 100644 index 00000000..35645cde Binary files /dev/null and b/docs/images/project-sc.png differ diff --git a/docs/images/project-spatial.png b/docs/images/project-spatial.png new file mode 100644 index 00000000..058cd700 Binary files /dev/null and b/docs/images/project-spatial.png differ diff --git a/docs/images/sample-download-folder.png b/docs/images/sample-download-folder.png deleted file mode 100644 index f7b9a163..00000000 Binary files a/docs/images/sample-download-folder.png and /dev/null differ diff --git a/docs/images/spatial-download-folder.png b/docs/images/spatial-download-folder.png deleted file mode 100644 index 7e52481b..00000000 Binary files a/docs/images/spatial-download-folder.png and /dev/null differ diff --git a/docs/index.md b/docs/index.md index fa892a37..88364d86 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,6 +7,7 @@ The ScPCA Portal is a growing database of uniformly processed single-cell data f :maxdepth: 4 processing_information +download_options download_files sce_file_contents merged_objects diff --git a/docs/merged_objects.md b/docs/merged_objects.md index 6e63e324..35499ed3 100644 --- a/docs/merged_objects.md +++ b/docs/merged_objects.md @@ -177,7 +177,7 @@ Each such list will contain the following fields: | `adt_normalization` | If CITE-seq was performed, the method used for normalization of raw ADT counts. Either `median-based` or `log-normalization`, as explained in the {ref}`processed ADT data section ` | | `highly_variable_genes` | A list of highly variable genes used for dimensionality reduction, determined using `scran::modelGeneVar` and `scran::getTopHVGs` | | `celltype_methods` | If cell type annotation was performed, a vector of the methods used for annotation. May include `"submitter"`, `"openscpca"`, `"singler"` and/or `"cellassign"` | -| `openscpca_celltype_module_name` | If cell type annotations from the OpenScPCA project are available, the original module name from the[`OpenScPCA-analysis` GitHub repository](https://github.com/AlexsLemonade/OpenScPCA-analysis) | +| `openscpca_celltype_module_name` | If cell type annotations from the OpenScPCA project are available, the original module name from the [`OpenScPCA-analysis` GitHub repository](https://github.com/AlexsLemonade/OpenScPCA-analysis) | | `openscpca_celltype_nf_version` | If cell type annotations from the OpenScPCA project are available, the version of the [`OpenScPCA-nf` workflow](https://github.com/AlexsLemonade/OpenScPCA-nf) used to generate annotations | | `openscpca_celltype_release_date` | If cell type annotations from the OpenScPCA project are available, the release date for the input ScPCA data used when assigning annotations | | `singler_results` | If cell typing with `SingleR` was performed, the full result object returned by `SingleR` annotation | diff --git a/docs/processing_information.md b/docs/processing_information.md index e838bd9b..eefdc923 100644 --- a/docs/processing_information.md +++ b/docs/processing_information.md @@ -79,7 +79,7 @@ Finally, these principal components are used to calculate the [UMAP (Uniform Man #### Cell type annotation -We perform cell type annotation with three complementary methods, where possible, and assign a single consensus cell type annotation based on agreement between these methods: +We perform cell type annotation with three complementary methods, where possible, and assign a single consensus cell type annotation based on agreement among these methods: - [`SingleR`](https://bioconductor.org/packages/release/bioc/html/SingleR.html), a reference-based cell type annotation method ([Looney _et al._ 2019](https://doi.org/10.1038/s41590-018-0276-y)) - [`CellAssign`](https://github.com/Irrationone/cellassign), a marker-gene-based cell type annotation method ([Zhang _et al._ 2019](https://doi.org/10.1038/s41592-019-0529-1)) diff --git a/docs/sce_file_contents.md b/docs/sce_file_contents.md index 4306797e..08b5a0a4 100644 --- a/docs/sce_file_contents.md +++ b/docs/sce_file_contents.md @@ -172,7 +172,7 @@ metadata(sce) # experiment metadata | `cluster_weighting` | The weighting approach used during graph-based clustering. Only present for `processed` objects | | `cluster_nn` | The nearest neighbor parameter value used for the graph-based clustering. Only present for `processed` objects | | `celltype_methods` | If cell type annotation was performed, a vector of the methods used for annotation. May include `"submitter"`, `"openscpca"`, `"singler"` and/or `"cellassign"`. If submitter or OpenScPCA cell type annotations are available, this metadata item will be present in all objects. Otherwise, this item will only be in `processed` objects | -| `openscpca_celltype_module_name` | If cell type annotations from the OpenScPCA project are available, the original module name from the[`OpenScPCA-analysis` GitHub repository](https://github.com/AlexsLemonade/OpenScPCA-analysis) | +| `openscpca_celltype_module_name` | If cell type annotations from the OpenScPCA project are available, the original module name from the [`OpenScPCA-analysis` GitHub repository](https://github.com/AlexsLemonade/OpenScPCA-analysis) | | `openscpca_celltype_nf_version` | If cell type annotations from the OpenScPCA project are available, the version of the [`OpenScPCA-nf` workflow](https://github.com/AlexsLemonade/OpenScPCA-nf) used to generate annotations | | `openscpca_celltype_release_date` | If cell type annotations from the OpenScPCA project are available, the release date for the input ScPCA data used when assigning annotations | | `singler_results` | If cell typing with `SingleR` was performed, the full [`DataFrame`](https://rdrr.io/bioc/S4Vectors/man/DataFrame-class.html) result object returned by `SingleR` annotation. Only present for `processed` objects |