In this part of the workshop, we will focus on certain aspects of comparative genomics. Specifically, our attention will be directed toward whole genome alignment (WGA) as a tool for investigating the changes occurring in different regions of the genome that interest us. Additionally, we will explore identifying potential regulatory elements in intergenic regions. All of this will be done in the context of the radiation of the neotropical Heliconius butterflies.
Below is a breakdown of the required dependencies.
- Kent toolkit. All of the binaries required are available pre-compiled on the utility page. The tools needed for this tutorial are
wigToBigWig
,gtfToGenePred
,genePredToBed
. - Bedtools & Samtools
- Progressive Cactus. With it, you should also find precompiled binaries for Hal Tools.
gffread
from the Cufflink package.- Phylogenetic Analysis with Space/Time Models (PHAST) package.
- IGV or any other genome browser you like, to have a look at all the tracks you will generate.
Figure. The dated species phylogeny built from the concatenated single-copy orthologous groups (scOGs) from all sequenced Heliconiinae and outgroups, using a combination of Maximum Likelihood and Bayesian Inference. The branch color represents the number of substitutions per site per 100 Mya of that specific branch. Species names in bold indicate the species with chromosome- or sub-chromosome-level assemblies, asterisks indicate genomes assembled in this study, C curated assemblies; ii) genome assembly size, in red the TE fractions; iii) BUSCO profiles for each species. Blue indicates the fraction of complete single-copy genes; iv) bar plots show total gene counts partitioned according to their orthology profiles, from Nymphalids to lineage-restricted and clade-specific genes. From Cicconardi et al. (2023).
The tutorial is divided into three sections
- Part I: Whole Genome Alignment
- Part II: HAL tools and Alignment Manipulation
- Part III: Identification of Conserved Regions
Otherwise, you can download it at this link.