-
Notifications
You must be signed in to change notification settings - Fork 13
Output of Higashi main
The output path, naming convention, etc., are defined in the configuration file (See details in "configuration of parameters").
For instance, all output of Higashi is stored at the temp_dir
of the configuration file.
See tutorials for examples on how to use these output files.
The single cell embeddings are saved with the name /embed/{embedding_name}_0_origin.npy
, where the row order is consistent with the cell_id
column of the input file data.txt
Besides the cell embeddings, the embeddings for the genomic bins are also saved with the name {embedding_name}_{id}_origin.npy
.
- The
{embedding_name}
is the parameter in the configuration file. - The
{id}
starts at 0, ends with the number of chromosomes that is contained in the training data. 0 corresponds to the cell embeddings. 1 ~ corresponds to embeddings of bins from each chromosome.
The embeddings can be load with the standard np.load(xxx)
The imputed matrices are saved with name {chrom_name}_{embedding_name}_nbr_{k}_impute.hdf5
.
- The
{chrom_name}
is the chromosome name of the imputed maps. - The
{embedding_name}
is the parameter in the configuration file. - The
{k}
can be either 0 or the{neighbor_num}
parameter specified in the configuration file. When{k}=0
, it represents the imputation results without using any neighboring cell information.
The format of the imputed matrix is an hdf5 file with the structure
.
├── coordinates (vector size of k x 2)
├── cell 0 (vector size of k)
├── cell 1
├── ...
└── cell N
The matrix can be generated by putting the vector of cell *
to the corresponding entries of the coordinates
.
For instance
import h5py
import numpy
with h5py.File(os.path.join(temp_dir, "%s_%s_nbr_0_impute.hdf5" % (chrom, embedding_name)), "r") as impute_f:
coordinates = impute_f['coordinates']
xs, ys = coordinates[:, 0], coordinates[:, 1]
size = int(np.max(ys)) + 1
cell_list = trange(len(list(impute_f.keys())) - 1)
m1 = np.zeros((size, size))
for i in cell_list:
m1 *= 0.0
proba = np.array(impute_f["cell_%d" % i])
m1[xs.astype('int'), ys.astype('int')] += proba
m1 = m1 + m1.T
We provide a script that can select groups of cells, merge the imputation results and save as .cool files. Detailed documentation of the .cool format can be found at https://cooler.readthedocs.io/en/latest/schema.html?).
To do that run the following command
python Merge2Cool.py [-c CONFIG] [-o OUTPUT] [-l LIST_PATH] [-t LIST_TYPE] [-n]
'
optional arguments:
-n, --neighbor Create .cool files for imputed maps with neighboring cell information utilized.
required arguments:
-c CONFIG The path to the configuration JSON file that you created in the step.
-o OUTPUT The path and prefix of the output cool names. (example: ./output/test)
-l LIST_PATH The path to a list. The file format for this list can be either .txt or .npy. You
can specify what groups of cells you want to merge and output in two ways:
1. The list contains the `cell_id` of interest, e.g. [1,2,10,20,135,..,]. When
doing so, the imputation results of these cells would be selected, merged and saved
in a file named {OUTPUT}.cool.
2. The list contains the group information, e.g. [GM12878, K562, ..., K562].
When doing so, the imputation results of cells from each group would be selected,
merged and saved in files such as {OUTPUT}_GM12878.cool, etc.
3. If this parameter is not passed, the program will create the merged imputed contact maps of all cells by default.
-t {selected, group} `selected` represents the first way of specifying cells of interest, while `group`
represents the second way.
'
With this script, one can create individual .cool files for each single cell by inputing a list that goes [cell_0, cell_1, cell_2, ..., cell_N]
and specify the {LIST_TYPE}
as group
.
Higashi ~ ~ Wiki
- Input files
- Usage (API)
- [Fast-Higashi initialized Higashi (Under construction)]
- Runtime of Fast-Higashi