Skip to content

Output of Higashi main

ruochiz edited this page Sep 5, 2021 · 20 revisions

The output path, naming convention, etc., are defined in the configuration file (See details in "configuration of parameters"). For instance, all output of Higashi is stored at the temp_dir of the configuration file. See tutorials for examples on how to use these output files.

Embedding vectors

embed

The single cell embeddings are saved with the name /embed/{embedding_name}_0_origin.npy, where the row order is consistent with the cell_id column of the input file data.txt

Besides the cell embeddings, the embeddings for the genomic bins are also saved with the name {embedding_name}_{id}_origin.npy.

  • The {embedding_name} is the parameter in the configuration file.
  • The {id} starts at 0, ends with the number of chromosomes that is contained in the training data. 0 corresponds to the cell embeddings. 1 ~ corresponds to embeddings of bins from each chromosome.

The embeddings can be load with the standard np.load(xxx)

Imputed contact maps

figs/imputation_showcase.png

The imputed matrices are saved with name {chrom_name}_{embedding_name}_nbr_{k}_impute.hdf5.

  • The {chrom_name} is the chromosome name of the imputed maps.
  • The {embedding_name} is the parameter in the configuration file.
  • The {k} can be either 0 or the {neighbor_num} parameter specified in the configuration file. When {k}=0, it represents the imputation results without using any neighboring cell information.

The format of the imputed matrix is an hdf5 file with the structure

.
├── coordinates (vector size of k x 2)
├── cell 0 (vector size of k)
├── cell 1
├── ...
└── cell N

The matrix can be generated by putting the vector of cell * to the corresponding entries of the coordinates. For instance

import h5py
import numpy
with h5py.File(os.path.join(temp_dir, "%s_%s_nbr_0_impute.hdf5" % (chrom, embedding_name)), "r") as impute_f:
    coordinates = impute_f['coordinates']
    xs, ys = coordinates[:, 0], coordinates[:, 1]
    size = int(np.max(ys)) + 1
    cell_list = trange(len(list(impute_f.keys())) - 1)
    m1 = np.zeros((size, size))
    for i in cell_list:
        m1 *= 0.0
        proba = np.array(impute_f["cell_%d" % i])
        m1[xs.astype('int'), ys.astype('int')] += proba
        m1 = m1 + m1.T		

Transform imputation results to .cool files.

We provide a script that can select groups of cells, merge the imputation results and save as .cool files. Detailed documentation of the .cool format can be found at https://cooler.readthedocs.io/en/latest/schema.html?).

To do that run the following command

python Merge2Cool.py [-c CONFIG] [-o OUTPUT] [-l LIST_PATH] [-t LIST_TYPE] [-n] 

'
optional arguments:
-n, --neighbor        Create .cool files for imputed maps with neighboring cell information utilized.

required arguments:
-c CONFIG             The path to the configuration JSON file that you created in the step.
-o OUTPUT             The path and prefix of the output cool names. (example: ./output/test)
-l LIST_PATH          The path to a list. The file format for this list can be either .txt or .npy. You 
                      can specify what groups of cells you want to merge and output in two ways:
                      1. The list contains the `cell_id` of interest, e.g. [1,2,10,20,135,..,]. When 
                      doing so, the imputation results of these cells would be selected, merged and saved 
                      in a file named {OUTPUT}.cool.
                      2. The list contains the group information, e.g. [GM12878, K562, ..., K562]. 
                      When doing so, the imputation results of cells from each group would be selected, 
                      merged and saved in files such as {OUTPUT}_GM12878.cool, etc.
                      3. If this parameter is not passed, the program will create the merged imputed contact maps of all cells by default.
-t {selected, group}  `selected` represents the first way of specifying cells of interest, while `group` 
                      represents the second way.
'

With this script, one can create individual .cool files for each single cell by inputing a list that goes [cell_0, cell_1, cell_2, ..., cell_N] and specify the {LIST_TYPE} as group.

Clone this wiki locally