Skip to content

Clarification Needed on Dataset Processing #4

@nicole1q

Description

@nicole1q

Hi, @M0hammadL@b-schubert@ArcaneEmergence@irene-bonapa and @drEast

I am currently working to reproduce your meaningful study, "Multi-modal generative modeling for joint analysis of single-cell T cell receptor and gene expression data." However, I have encountered some issues while processing the datasets and would greatly appreciate your guidance.

  1. Regarding the Borcherding Dataset
    I recently downloaded the Utility v0.0.4 dataset from Zenodo, but I couldn’t locate the rna_combined.h5ad and tcrs_combined.csv files mentioned in preprocessing/Borcherding_preprocessing.ipynb.

When exploring the dataset:

The h5ad files in utility/data/processedData/individualSeurat/h5ad/ contain only gene expression (GEX) information.
The TCR annotations in utility/data/processedData/combinedDataSets/ use barcodes (e.g., barcode) that do not match those in the h5ad objects (e.g., BCT1.1_AAACCTGCAGATCGGA-1).
Could you clarify where to find the paired h5ad and TCR annotation files?

  1. Regarding the COVID Dataset
    I downloaded the dataset from GSE171037, but encountered similar issues:

The file GSE171037_TcellReversePT_integrated_Tcells.h5ad only contains GEX data.
The raw data in GSE171037_RAW includes only orphan TCR information (e.g., 'orphan VDJ', 'orphan VJ') without paired TCR data.
Additionally, I couldn’t find the Covid19_TAs_PBMCs.h5ad file mentioned in your work.

Could you please provide guidance on where to find the complete datasets or suggest an alternative approach?

Thank you very much for your time and assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions