diff --git a/docs/faq.md b/docs/faq.md index 963ad85..337ab30 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -291,13 +291,14 @@ Download links expire in 7 days, but you can generate a new link on the ScPCA Po Download links are only available for projects (i.e., not for downloading individual samples). ## Can I download data from the Portal programmatically? -## Why can't I change the data format in My Dataset? We provide an R package, [`ScPCAr`](https://alexslemonade.github.io/ScPCAr/), to facilitate programmatic access to the ScPCA Portal. This package allows you to search for and download data from the ScPCA Portal directly within R. Please see the [package documentation](https://alexslemonade.github.io/ScPCAr/) for more details about installation and usage. Source code for the package can be found on [GitHub](https://github.com/AlexsLemonade/ScPCAr). +## Why can't I change the data format in My Dataset? + When creating a {ref}`custom dataset for download` (`My Dataset`), all single-cell sample or project data included must be of the same {ref}`data format`, either `SingleCellExperiment` for use in R or `AnnData` for use in Python. We currently do not support including both data formats at once in `My Dataset`. Once a sample or project of a given data format has been added to `My Dataset`, all subsequent single-cell or single-nuclei data added will automatically be in that same format. diff --git a/docs/merged_objects.md b/docs/merged_objects.md index 0375286..1cb3f6c 100644 --- a/docs/merged_objects.md +++ b/docs/merged_objects.md @@ -152,16 +152,20 @@ Each such list will contain the following fields: | `sample_id` | Sample ID in the form `SCPCS000000` | | `library_id` | Library ID in the form `SCPCL000000` | | `project_id` | Project ID in the form `SCPCP000000` | -| `salmon_version` | Version of `salmon` used for initial mapping | | `reference_index` | Transcriptome reference file used for mapping | -| `total_reads` | Total number of reads processed by `salmon` | -| `mapped_reads` | Number of reads successfully mapped | -| `mapping_tool` | Pipeline used for mapping and quantification (`alevin-fry` for all current data in ScPCA) | -| `alevinfry_version` | Version of `alevin-fry` used for mapping and quantification | -| `af_permit_type` | `alevin-fry generate-permit-list` method used for filtering cell barcodes | -| `af_resolution` | `alevin-fry quant` resolution mode used | -| `usa_mode` | Boolean indicating whether quantification was done using `alevin-fry` USA mode | -| `af_num_cells` | Number of cells reported by `alevin-fry` | +| `mapping_tool` | Pipeline used for mapping and quantification. `alevin-fry` for all tag-based experiments (e.g. 10Xv2, 10Xv3) and `cellranger multi` for probe-based experiments (GEM-X Flex) | +| `salmon_version` | Version of `salmon` used for initial mapping. Only present if the `mapping_tool` is `alevin-fry` | +| `alevinfry_version` | Version of `alevin-fry` used for mapping and quantification. Only present if the `mapping_tool` is `alevin-fry` | +| `af_permit_type` | `alevin-fry generate-permit-list` method used for filtering cell barcodes. Only present if the `mapping_tool` is `alevin-fry` | +| `af_resolution` | `alevin-fry quant` resolution mode used. Only present if the `mapping_tool` is `alevin-fry` | +| `usa_mode` | Boolean indicating whether quantification was done using `alevin-fry` USA mode. Only present if the `mapping_tool` is `alevin-fry` | +| `total_reads` | Total number of reads processed by `salmon` or `cellranger multi` | +| `mapped_reads` | Number of reads successfully mapped. Only present if the `mapping_tool` is `alevin-fry` | +| `af_num_cells` | Number of cells reported by `alevin-fry`. Only present if the `mapping_tool` is `alevin-fry` | +| `cellranger_version` | Total number of reads processed by `salmon` or `cellranger multi` | +| `reference_probeset` | Version of probe set from 10x Genomics used for mapping and quantification. Only present if the `mapping_tool` is `cellranger multi` | +| `pct_mapped_reads` | Total percentage of reads mapped by `cellranger multi`. Only present if the `mapping_tool` is `cellranger multi` | +| `cellranger_num_cells` | Number of cells reported by `cellranger multi`. Only present if the `mapping_tool` is `cellranger multi` | | `tech_version` | A string indicating the technology and version used for the single-cell library, such as 10Xv2, 10Xv3, or 10Xv3.1 | | `assay_ontology_term_id` | A string indicating the [Experimental Factor Ontology](https://www.ebi.ac.uk/ols/ontologies/efo) term ID associated with the `tech_version` | | `seq_unit` | `cell` for single-cell samples or `nucleus` for single-nucleus samples | diff --git a/docs/processing_information.md b/docs/processing_information.md index eefdc92..84d02bb 100644 --- a/docs/processing_information.md +++ b/docs/processing_information.md @@ -5,7 +5,10 @@ ### Mapping and quantification using alevin-fry We used [`salmon`](https://salmon.readthedocs.io/en/latest) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/) to generate gene by cell counts matrices for all single-cell and single-nuclei samples. -In brief, we utilized [selective alignment](#selective-alignment) to the [`splici` index](#reference-transcriptome-index) for all single-cell and single-nuclei samples. +In brief, we utilized [selective alignment](#selective-alignment) to the [`splici` index](#reference-transcriptome-index). + +The only exception to this was single-nuclei samples generated using the probe-based GEM-X Flex platform from 10x Genomics. +See [Quantification for GEM-X Flex samples](#mapping-and-quantification-for-gem-x-flex-samples) for more information. #### Reference transcriptome index @@ -36,12 +39,23 @@ In contrast to Cell Ranger, `cr-like-em` keeps multi-mapped reads and invokes an 3. With initial mapping to the `splici` index, `alevin-fry` quantification resulted in separate counts for spliced and unspliced transcripts, and an ambiguous count for reads compatible with either spliced or unspliced transcripts. -### Post alevin-fry processing - #### Combining counts from spliced cDNA and intronic regions -For single-cell and single-nuclei samples, the reads from spliced cDNA and intronic regions are combined by gene to produce a gene by cell counts matrix. +For single-cell and single-nuclei samples processed with `alevin-fry`, the reads from spliced cDNA and intronic regions are combined by gene to produce a gene by cell counts matrix. After combining read counts, values are rounded to integer values. +The counts data from this step can be found in the `unfiltered` objects included with each library. + +### Mapping and quantification for GEM-X Flex samples + +Libraries that were generated using the GEM-X Flex technology from 10x Genomics were quantified with the [`cellranger multi` pipeline](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/cr-multi) within Cell Ranger. +The `cellranger mkref` command was used to create a transcriptome reference index compatible with Cell Ranger. +The probe set associated with the specific GEM-X Flex version used for library preparation was provided as input alongside the transcriptome reference index and FASTQ files to `cellranger multi`. + +If samples were multiplexed into a single library using GEM-X Flex, demultiplexing was performed as part of `cellranger multi` so that outputs were separated to have one gene by cell counts matrix for each sample. + +The gene by cell counts matrix output for each sample in the `raw_feature_bc_matrix` folder was saved to the `unfiltered` objects included with each library. + +### Post alignment processing #### Filtering cells diff --git a/docs/sce_file_contents.md b/docs/sce_file_contents.md index 14dd615..de7c9ee 100644 --- a/docs/sce_file_contents.md +++ b/docs/sce_file_contents.md @@ -142,16 +142,20 @@ metadata(sce) # experiment metadata | Item name | Contents | | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `salmon_version` | Version of `salmon` used for initial mapping | | `reference_index` | Transcriptome reference file used for mapping | -| `total_reads` | Total number of reads processed by `salmon` | -| `mapped_reads` | Number of reads successfully mapped | -| `mapping_tool` | Pipeline used for mapping and quantification (`alevin-fry` for all current data in ScPCA) | -| `alevinfry_version` | Version of `alevin-fry` used for mapping and quantification | -| `af_permit_type` | `alevin-fry generate-permit-list` method used for filtering cell barcodes | -| `af_resolution` | `alevin-fry quant` resolution mode used | -| `usa_mode` | Boolean indicating whether quantification was done using `alevin-fry` USA mode | -| `af_num_cells` | Number of cells reported by `alevin-fry` | +| `mapping_tool` | Pipeline used for mapping and quantification. `alevin-fry` for all tag-based experiments (e.g. 10Xv2, 10Xv3) and `cellranger multi` for probe-based experiments (GEM-X Flex) | +| `salmon_version` | Version of `salmon` used for initial mapping. Only present if the `mapping_tool` is `alevin-fry` | +| `alevinfry_version` | Version of `alevin-fry` used for mapping and quantification. Only present if the `mapping_tool` is `alevin-fry` | +| `af_permit_type` | `alevin-fry generate-permit-list` method used for filtering cell barcodes. Only present if the `mapping_tool` is `alevin-fry` | +| `af_resolution` | `alevin-fry quant` resolution mode used. Only present if the `mapping_tool` is `alevin-fry` | +| `usa_mode` | Boolean indicating whether quantification was done using `alevin-fry` USA mode. Only present if the `mapping_tool` is `alevin-fry` | +| `total_reads` | Total number of reads processed by `salmon` or `cellranger multi` | +| `mapped_reads` | Number of reads successfully mapped. Only present if the `mapping_tool` is `alevin-fry` | +| `af_num_cells` | Number of cells reported by `alevin-fry`. Only present if the `mapping_tool` is `alevin-fry` | +| `cellranger_version` | Total number of reads processed by `salmon` or `cellranger multi` | +| `reference_probeset` | Version of probe set from 10x Genomics used for mapping and quantification. Only present if the `mapping_tool` is `cellranger multi` | +| `pct_mapped_reads` | Total percentage of reads mapped by `cellranger multi`. Only present if the `mapping_tool` is `cellranger multi` | +| `cellranger_num_cells` | Number of cells reported by `cellranger multi`. Only present if the `mapping_tool` is `cellranger multi` | | `tech_version` | A string indicating the technology and version used for the single-cell library, such as 10Xv2, 10Xv3, or 10Xv3.1 | | `assay_ontology_term_id` | A string indicating the [Experimental Factor Ontology](https://www.ebi.ac.uk/ols/ontologies/efo) term ID associated with the `tech_version` | | `seq_unit` | `cell` for single-cell samples or `nucleus` for single-nuclei samples |