Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 13 additions & 9 deletions docs/merged_objects.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,16 +152,20 @@ Each such list will contain the following fields:
| `sample_id` | Sample ID in the form `SCPCS000000` |
| `library_id` | Library ID in the form `SCPCL000000` |
| `project_id` | Project ID in the form `SCPCP000000` |
| `salmon_version` | Version of `salmon` used for initial mapping |
| `reference_index` | Transcriptome reference file used for mapping |
| `total_reads` | Total number of reads processed by `salmon` |
| `mapped_reads` | Number of reads successfully mapped |
| `mapping_tool` | Pipeline used for mapping and quantification (`alevin-fry` for all current data in ScPCA) |
| `alevinfry_version` | Version of `alevin-fry` used for mapping and quantification |
| `af_permit_type` | `alevin-fry generate-permit-list` method used for filtering cell barcodes |
| `af_resolution` | `alevin-fry quant` resolution mode used |
| `usa_mode` | Boolean indicating whether quantification was done using `alevin-fry` USA mode |
| `af_num_cells` | Number of cells reported by `alevin-fry` |
| `mapping_tool` | Pipeline used for mapping and quantification. `alevin-fry` for all tag-based experiments (e.g. 10Xv2, 10Xv3) and `cellranger-multi` for probe-based experiments (GEM-X Flex) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can neither confirm nor deny that I just spent 3 minutes being really confused why we don't set this for spaceranger too, only to remember we do not make SPEs. That's the other thing I work on (training materials).

| `salmon_version` | Version of `salmon` used for initial mapping. Only present if the `mapping_tool` is `alevin-fry` |
| `alevinfry_version` | Version of `alevin-fry` used for mapping and quantification. Only present if the `mapping_tool` is `alevin-fry` |
| `af_permit_type` | `alevin-fry generate-permit-list` method used for filtering cell barcodes. Only present if the `mapping_tool` is `alevin-fry` |
| `af_resolution` | `alevin-fry quant` resolution mode used. Only present if the `mapping_tool` is `alevin-fry` |
| `usa_mode` | Boolean indicating whether quantification was done using `alevin-fry` USA mode. Only present if the `mapping_tool` is `alevin-fry` |
| `total_reads` | Total number of reads processed by `salmon` or `cellranger multi` |
| `mapped_reads` | Number of reads successfully mapped. Only present if the `mapping_tool` is `alevin-fry` |
| `af_num_cells` | Number of cells reported by `alevin-fry`. Only present if the `mapping_tool` is `alevin-fry` |
| `cellranger_version` | Total number of reads processed by `salmon` or `cellranger multi` |
| `reference_probeset` | Version of probe set from 10x Genomics used for mapping and quantification. Only present if the `mapping_tool` is `cellranger-multi` |
| `pct_mapped_reads` | Total percentage of reads mapped by `cellranger-multi`. Only present if the `mapping_tool` is `cellranger-multi` |
| `cellranger_num_cells` | Number of cells reported by `cellranger multi`. Only present if the `mapping_tool` is `cellranger-multi` |
| `tech_version` | A string indicating the technology and version used for the single-cell library, such as 10Xv2, 10Xv3, or 10Xv3.1 |
| `assay_ontology_term_id` | A string indicating the [Experimental Factor Ontology](https://www.ebi.ac.uk/ols/ontologies/efo) term ID associated with the `tech_version` |
| `seq_unit` | `cell` for single-cell samples or `nucleus` for single-nucleus samples |
Expand Down
22 changes: 18 additions & 4 deletions docs/processing_information.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@
### Mapping and quantification using alevin-fry

We used [`salmon`](https://salmon.readthedocs.io/en/latest) and [`alevin-fry`](https://alevin-fry.readthedocs.io/en/latest/) to generate gene by cell counts matrices for all single-cell and single-nuclei samples.
In brief, we utilized [selective alignment](#selective-alignment) to the [`splici` index](#reference-transcriptome-index) for all single-cell and single-nuclei samples.
In brief, we utilized [selective alignment](#selective-alignment) to the [`splici` index](#reference-transcriptome-index).

The only exception to this was single-nuclei samples generated using the probe-based GEM-X Flex platform from 10x Genomics.
See [Quantification for GEM-X Flex samples](#quantification-for-gem-x-flex-samples) for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See [Quantification for GEM-X Flex samples](#quantification-for-gem-x-flex-samples) for more information.
See [Quantification for GEM-X Flex samples](#mapping-and-quantification-for-gem-x-flex-samples) for more information.


#### Reference transcriptome index

Expand Down Expand Up @@ -36,12 +39,23 @@ In contrast to Cell Ranger, `cr-like-em` keeps multi-mapped reads and invokes an

3. With initial mapping to the `splici` index, `alevin-fry` quantification resulted in separate counts for spliced and unspliced transcripts, and an ambiguous count for reads compatible with either spliced or unspliced transcripts.

### Post alevin-fry processing

#### Combining counts from spliced cDNA and intronic regions

For single-cell and single-nuclei samples, the reads from spliced cDNA and intronic regions are combined by gene to produce a gene by cell counts matrix.
For single-cell and single-nuclei samples processed with `alevin-fry`, the reads from spliced cDNA and intronic regions are combined by gene to produce a gene by cell counts matrix.
After combining read counts, values are rounded to integer values.
The counts data from this step can be found in the `unfiltered` objects included with each library.

### Mapping and quantification for GEM-X Flex samples

Libraries that were generated using the GEM-X Flex technology from 10x Genomics were quantified with the [`cellranger multi` pipeline](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/cr-multi) within Cell Ranger.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that applies throughout PR - pick either cellranger multi or cellranger-multi

The `cellranger mkref` command was used to create a transcriptome reference index compatible with Cell Ranger.
The probe set associated with the specific GEM-X Flex version used for library preparation was provided as input alongside the transcriptome reference index and FASTQ files to `cellranger multi`.

If samples were multiplexed into a single library using GEM-X Flex, demultiplexing was performed as part of `cellranger multi` so that outputs were separated to have one gene by cell counts matrix for each sample.

The gene by cell counts matrix output for each sample in the `raw_feature_bc_matrix` folder was saved to the `unfiltered` objects included with each library.

### Post alignment processing

#### Filtering cells

Expand Down
Loading
Loading