Hi @snehamitra ,
Thank you for developing this great tool. I'm currently running the tutorials using the mitrasneha/scarlink:latest Docker image and have encountered a consistent issue when using the pre-computed data from Figshare.
Here is the workflow that led to the error:
-
I started by trying to run the chromatin_potential.ipynb notebook, which requires the human cortex dataset. But it isn't provided in chromatin_potential.ipynb.
-
So I find the note in figure_1.ipynb which mentions: "The majority of data files and SCARlink models used in this study are available on Figshare...".
-
Based on this, I assumed that the human_cortex_all_out_10k dataset from Figshare was the correct, pre-computed SCARlink dataset to be used as input for chromatin_potential.ipynb. I downloaded the pre-computed human_cortex_all_out_10k dataset from Figshare and tried to run chromatin_potential.ipynb (specifically the scp.create_object() step). This resulted in the following error:
KeyError Traceback (most recent call last)
Cell In[20], line 1
----> 1 chrom_p = scp.create_object(dirname)
File /app/SCARlink/scarlink/src/chromatin_potential.py:53, in create_object(dirname, smooth_k, use_hvg, celltype_col, umap, lsi_file)
51 out_dir = dirname + '/scarlink_out/'
52 lsi = pandas.read_csv(lsi_file, sep='\t').values
---> 53 yp, yo = smooth_vals(out_dir, lsi, smooth_k)
...
File /app/SCARlink/scarlink/src/get_smoothed_pred_obs.py:75, in get_y_unscaled(dirname, genes, yp_file, yo_file, smooth_vals, nbrs, all_genes)
73 f_genes = list(f['genes/'].keys())
74 f.close()
---> 75 rm = read_model(dirname, out_file_name = coef_file.split('/')[-1])
...
File /app/SCARlink/scarlink/src/read_model.py:37, in read_model(out_dir, out_file_name, input_file_name, read_only)
35 input_file_name = dirname + 'coassay_matrix.h5'
36 gtf_file = f['genes'].attrs['gtf_file']
---> 37 norm_factor = f['genes'].attrs['norm_factor_scatac']
...
KeyError: "Unable to synchronously open attribute (can't locate attribute: 'norm_factor_scatac')"
-
After seeing this, I decided to go back to the figure_1.ipynb notebook to check if the Figshare datasets themselves had a problem.
-
I tried running the code from figure_1.ipynb (specifically the plot_compare_gsm_corr function) using both the human_cortex_all_out_10k and pbmc_all_out_10k datasets. (fig_data.zip and scripts.zip are aslo downloaded and unziped)Both attempts failed with the exact same KeyError at the exact same line in read_model.py. Here is the traceback for the pbmc_all_out_10k data:
KeyError Traceback (most recent call last)
Cell In[22], line 5
----> 5 plot_compare_gsm_corr(pbmc_scarlink_out, pbmc_gsm_prefix, pbmc_out_prefix, "10x PBMC")
File /data/scripts/compare_corrs.py:172, in plot_compare_gsm_corr(...)
--> 172 df = make_gsm_scarlink_table(dirname, gsm_prefix, output_prefix, coassay_file=coassay_file, check_hvg=check_hvg)
...
File /data/scripts/compare_corrs.py:89, in make_gsm_scarlink_table(...)
---> 89 rm = read_model(dirname, out_file_name=coef_file.split('/')[-1], read_only=True)
...
File /app/SCARlink/scarlink/src/read_model.py:37, in read_model(...)
---> 37 norm_factor = f['genes'].attrs['norm_factor_scatac']
...
KeyError: "Unable to synchronously open attribute (can't locate attribute: 'norm_factor_scatac')"
It seems that any pre-computed model I download from Figshare (both human_cortex and pbmc) fails to load in the latest Docker image because the read_model.py script requires a norm_factor_scatac attribute that doesn't exist in these files?
This makes me wonder if there's a version mismatch between the latest image and the pre-computed data on Figshare, or if my assumption in step 3 was incorrect.
Any guidance you could provide on how to resolve this would be greatly appreciated.
Thank you!
Hi @snehamitra ,
Thank you for developing this great tool. I'm currently running the tutorials using the mitrasneha/scarlink:latest Docker image and have encountered a consistent issue when using the pre-computed data from Figshare.
Here is the workflow that led to the error:
I started by trying to run the
chromatin_potential.ipynbnotebook, which requires the human cortex dataset. But it isn't provided inchromatin_potential.ipynb.So I find the note in figure_1.ipynb which mentions: "The majority of data files and SCARlink models used in this study are available on Figshare...".
Based on this, I assumed that the
human_cortex_all_out_10kdataset from Figshare was the correct, pre-computed SCARlink dataset to be used as input forchromatin_potential.ipynb. I downloaded the pre-computed human_cortex_all_out_10k dataset from Figshare and tried to runchromatin_potential.ipynb(specifically thescp.create_object()step). This resulted in the following error:After seeing this, I decided to go back to the
figure_1.ipynbnotebook to check if the Figshare datasets themselves had a problem.I tried running the code from
figure_1.ipynb(specifically the plot_compare_gsm_corr function) using both thehuman_cortex_all_out_10kandpbmc_all_out_10kdatasets. (fig_data.zipandscripts.zipare aslo downloaded and unziped)Both attempts failed with the exact same KeyError at the exact same line inread_model.py. Here is the traceback for thepbmc_all_out_10kdata:It seems that any pre-computed model I download from Figshare (both human_cortex and pbmc) fails to load in the latest Docker image because the read_model.py script requires a norm_factor_scatac attribute that doesn't exist in these files?
This makes me wonder if there's a version mismatch between the latest image and the pre-computed data on Figshare, or if my assumption in step 3 was incorrect.
Any guidance you could provide on how to resolve this would be greatly appreciated.
Thank you!