Skip to content

KeyError: 'norm_factor_scatac' when using Figshare data with latest Docker image #26

@allenphant

Description

@allenphant

Hi @snehamitra ,

Thank you for developing this great tool. I'm currently running the tutorials using the mitrasneha/scarlink:latest Docker image and have encountered a consistent issue when using the pre-computed data from Figshare.

Here is the workflow that led to the error:

  1. I started by trying to run the chromatin_potential.ipynb notebook, which requires the human cortex dataset. But it isn't provided in chromatin_potential.ipynb.

  2. So I find the note in figure_1.ipynb which mentions: "The majority of data files and SCARlink models used in this study are available on Figshare...".

  3. Based on this, I assumed that the human_cortex_all_out_10k dataset from Figshare was the correct, pre-computed SCARlink dataset to be used as input for chromatin_potential.ipynb. I downloaded the pre-computed human_cortex_all_out_10k dataset from Figshare and tried to run chromatin_potential.ipynb (specifically the scp.create_object() step). This resulted in the following error:

KeyError                                  Traceback (most recent call last)
Cell In[20], line 1
----> 1 chrom_p = scp.create_object(dirname)

File /app/SCARlink/scarlink/src/chromatin_potential.py:53, in create_object(dirname, smooth_k, use_hvg, celltype_col, umap, lsi_file)
     51 out_dir = dirname + '/scarlink_out/'
     52 lsi = pandas.read_csv(lsi_file, sep='\t').values
---> 53 yp, yo = smooth_vals(out_dir, lsi, smooth_k)
...
File /app/SCARlink/scarlink/src/get_smoothed_pred_obs.py:75, in get_y_unscaled(dirname, genes, yp_file, yo_file, smooth_vals, nbrs, all_genes)
     73 f_genes = list(f['genes/'].keys())
     74 f.close()
---> 75 rm = read_model(dirname, out_file_name = coef_file.split('/')[-1])
...
File /app/SCARlink/scarlink/src/read_model.py:37, in read_model(out_dir, out_file_name, input_file_name, read_only)
     35 input_file_name = dirname + 'coassay_matrix.h5'
     36 gtf_file = f['genes'].attrs['gtf_file']
---> 37 norm_factor = f['genes'].attrs['norm_factor_scatac']
...
KeyError: "Unable to synchronously open attribute (can't locate attribute: 'norm_factor_scatac')"
  1. After seeing this, I decided to go back to the figure_1.ipynb notebook to check if the Figshare datasets themselves had a problem.

  2. I tried running the code from figure_1.ipynb (specifically the plot_compare_gsm_corr function) using both the human_cortex_all_out_10k and pbmc_all_out_10k datasets. (fig_data.zip and scripts.zip are aslo downloaded and unziped)Both attempts failed with the exact same KeyError at the exact same line in read_model.py. Here is the traceback for the pbmc_all_out_10k data:

KeyError                                  Traceback (most recent call last)
Cell In[22], line 5
----> 5 plot_compare_gsm_corr(pbmc_scarlink_out, pbmc_gsm_prefix, pbmc_out_prefix, "10x PBMC")

File /data/scripts/compare_corrs.py:172, in plot_compare_gsm_corr(...)
--> 172     df = make_gsm_scarlink_table(dirname, gsm_prefix, output_prefix, coassay_file=coassay_file, check_hvg=check_hvg)
...
File /data/scripts/compare_corrs.py:89, in make_gsm_scarlink_table(...)
---> 89     rm = read_model(dirname, out_file_name=coef_file.split('/')[-1], read_only=True)
...
File /app/SCARlink/scarlink/src/read_model.py:37, in read_model(...)
---> 37 norm_factor = f['genes'].attrs['norm_factor_scatac']
...
KeyError: "Unable to synchronously open attribute (can't locate attribute: 'norm_factor_scatac')"

It seems that any pre-computed model I download from Figshare (both human_cortex and pbmc) fails to load in the latest Docker image because the read_model.py script requires a norm_factor_scatac attribute that doesn't exist in these files?

This makes me wonder if there's a version mismatch between the latest image and the pre-computed data on Figshare, or if my assumption in step 3 was incorrect.

Any guidance you could provide on how to resolve this would be greatly appreciated.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions