KeyError: 'norm_factor_scatac' when using Figshare data with latest Docker image

Hi @snehamitra ,

Thank you for developing this great tool. I'm currently running the tutorials using the mitrasneha/scarlink:latest Docker image and have encountered a consistent issue when using the pre-computed data from Figshare.

Here is the workflow that led to the error:

1. I started by trying to run the `chromatin_potential.ipynb` notebook, which requires the human cortex dataset. But it isn't provided in `chromatin_potential.ipynb`. 
 
2. So I find the note in figure_1.ipynb which mentions: "The majority of data files and SCARlink models used in this study are available on Figshare...". 

3. Based on this, I assumed that the `human_cortex_all_out_10k `dataset from Figshare was the correct, pre-computed SCARlink dataset to be used as input for `chromatin_potential.ipynb`. I downloaded the pre-computed human_cortex_all_out_10k dataset from Figshare and tried to run `chromatin_potential.ipynb` (specifically the `scp.create_object()` step). This resulted in the following error:
```
KeyError                                  Traceback (most recent call last)
Cell In[20], line 1
----> 1 chrom_p = scp.create_object(dirname)

File /app/SCARlink/scarlink/src/chromatin_potential.py:53, in create_object(dirname, smooth_k, use_hvg, celltype_col, umap, lsi_file)
     51 out_dir = dirname + '/scarlink_out/'
     52 lsi = pandas.read_csv(lsi_file, sep='\t').values
---> 53 yp, yo = smooth_vals(out_dir, lsi, smooth_k)
...
File /app/SCARlink/scarlink/src/get_smoothed_pred_obs.py:75, in get_y_unscaled(dirname, genes, yp_file, yo_file, smooth_vals, nbrs, all_genes)
     73 f_genes = list(f['genes/'].keys())
     74 f.close()
---> 75 rm = read_model(dirname, out_file_name = coef_file.split('/')[-1])
...
File /app/SCARlink/scarlink/src/read_model.py:37, in read_model(out_dir, out_file_name, input_file_name, read_only)
     35 input_file_name = dirname + 'coassay_matrix.h5'
     36 gtf_file = f['genes'].attrs['gtf_file']
---> 37 norm_factor = f['genes'].attrs['norm_factor_scatac']
...
KeyError: "Unable to synchronously open attribute (can't locate attribute: 'norm_factor_scatac')"
```

4. After seeing this, I decided to go back to the `figure_1.ipynb` notebook to check if the Figshare datasets themselves had a problem.

5. I tried running the code from `figure_1.ipynb` (specifically the plot_compare_gsm_corr function) using both the `human_cortex_all_out_10k `and `pbmc_all_out_10k `datasets. (`fig_data.zip` and `scripts.zip` are aslo downloaded and unziped)Both attempts failed with the exact same KeyError at the exact same line in `read_model.py`. Here is the traceback for the `pbmc_all_out_10k `data:


```
KeyError                                  Traceback (most recent call last)
Cell In[22], line 5
----> 5 plot_compare_gsm_corr(pbmc_scarlink_out, pbmc_gsm_prefix, pbmc_out_prefix, "10x PBMC")

File /data/scripts/compare_corrs.py:172, in plot_compare_gsm_corr(...)
--> 172     df = make_gsm_scarlink_table(dirname, gsm_prefix, output_prefix, coassay_file=coassay_file, check_hvg=check_hvg)
...
File /data/scripts/compare_corrs.py:89, in make_gsm_scarlink_table(...)
---> 89     rm = read_model(dirname, out_file_name=coef_file.split('/')[-1], read_only=True)
...
File /app/SCARlink/scarlink/src/read_model.py:37, in read_model(...)
---> 37 norm_factor = f['genes'].attrs['norm_factor_scatac']
...
KeyError: "Unable to synchronously open attribute (can't locate attribute: 'norm_factor_scatac')"
```

It seems that any pre-computed model I download from Figshare (both human_cortex and pbmc) fails to load in the latest Docker image because the read_model.py script requires a norm_factor_scatac attribute that doesn't exist in these files?

This makes me wonder if there's a version mismatch between the latest image and the pre-computed data on Figshare, or if my assumption in step 3 was incorrect.

Any guidance you could provide on how to resolve this would be greatly appreciated.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'norm_factor_scatac' when using Figshare data with latest Docker image #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

KeyError: 'norm_factor_scatac' when using Figshare data with latest Docker image #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions