Skip to content

Manuscript

Smruti Panda edited this page Feb 17, 2024 · 32 revisions

Oligo undergo evolutionary pressure

Link

Introduction

Background

Material and Methods

The E. coli Long-Term Evolution Experiment (LTEE) initiated in 1988 involved two strains of Escherichia coli, Ara− and Ara+, establishing twelve populations labeled Ara−1 to Ara−6 and Ara+1 to Ara+6. Genetic modifications, including a marker mutation for l-arabinose growth, were introduced. Daily dilutions allowed approximately 6.67 generations per day, with samples collected every 500 generations for genetic analysis. Additionally, a Mutation-Accumulation Experiment (MAE) was conducted with 15 lines starting from a mutated LTEE strain. Genome sequencing utilized high-throughput technologies, and breseq software predicted mutations, validated against the ancestral genome. Phylogenetic trees were constructed using the Jukes–Cantor model. Parallel evolution analyses examined genetic changes in non-mutator lineages, quantifying parallelism and ranking genes through statistical methods. The conclusion highlighted the investigation of genetic patterns, statistical testing for reliability, and the study of hypermutation-specific effects. (Copied from Nancy)

In the Yeast LTEE, strains MJM361, MJM335, and MJM102 were cultured in three environments using 96-well plates. Daily dilutions and a Biomek FXp robot were employed, with contamination monitored and addressed. Fitness assays involved reference strains, mixed populations, and flow cytometry. Cross-contamination and population loss were screened through drug resistance tests. Challenges in measuring ancestral fitness were addressed by estimating it from populations without nonsynonymous mutations. The conclusion emphasized the study of evolving yeast populations, addressing challenges, and providing insights into adaptive changes through reliable fitness measurements using reference strains and corrections for experimental variations. (Copied from Nancy)

Write about process of obtaining snp file

We acquire the SNP file containing information about mutations in each sample at specific positions, including details about the reference and alternate alleles (the mutated nucleotide at the given position). Subsequently, we proceed to extract reference and alternate flanks, each with a length of (2n+1), centered around the mutation position, where 'n' represents the length of both the upstream and downstream flanks. The reference and alternate flanks are identical except for the nucleotide at the mutation position. The reference flank retains the unaltered nucleotide as provided in the reference, while the mutated flank incorporates the variant allele at the mutation position. The upstream and downstream regions remain consistent between both the reference and mutated flanks. Following the acquisition of reference and alternate flanks, we initiate the generation of sliding windows of length 'k' with a step size of 1, for both the reference and alternate flanks. These sliding windows are chosen such that it always contain the mutation position. As a result, there are 'k' sliding windows for each unique mutation. For every sliding window, we compute the probabilistic frequency of k-mer in both the reference and alternate window, based on the reference fasta file (GCF_000017985.1_ASM1798v1_genomic.fna for the E.coli). Within each unique mutation, for each unique sliding window, we calculate the log-likelihood ratio of the probabilistic frequency of the alternate k-mer against the reference k-mer, referring to this as the "kmer gain." Consequently, for each unique mutation, there are 'k' distinct log-likelihood gains corresponding to the 'k' sliding windows. The accumulated gain is determined as the total sum of all loglikelihood gains across all sliding windows for a particular unique mutation.

Ecoli

To demonstrate the variation in mutation counts between mutator and non-mutator populations, Figure 1A has been included.

To investigate the median fitness of mutators and non-mutators over 50,000 generations (Wiser, Ribeck, and Lenski 2013), six samples were pooled for each population. Figure 1B, illustrates line plots of the fitness trajectories, accompanied by error bars representing the standard deviation of fitness estimates. The composite trapezoidal rule was applied to compute the Area Under the Curve (AUC) associated with these trajectories.

To investigate the median LLR scores associated with mutations, line plots were generated using Figure 1C, which represents data for mutators, non-mutators, and their combined populations (Good et al., 2017). The plots include a shaded area indicating the 95% confidence interval, derived through random sampling.

Figure 1D was created to visually and statistically analyze the distribution of median LLR scores over generations. It is a binarized version of Figure 1(C), presenting boxplots that depict the median LLR scores for two equal halves of approximately 30,000 generations each. The one-sided Mann-Whitney U test was applied to assess the statistical significance of any observed differences. It reveals insights into the changing patterns of median LLR scores over time, particularly comparing mutators and non-mutators.

To examine the association between median fitness and median LLR scores across generations, three scatter plots were generated. Figure 1E illustrates this association for the mutator group, using a color gradient to represent generation chronology. Figure 1F and Figure 1G depict the same association for the non-mutator group and the combined mutator and non-mutator groups, respectively.

To visually assess and analyze the performance differences between mutators and non-mutators, Figure 1H was generated. This figure includes a box plot illustrating the accumulated gain for each population, providing a graphical representation for a comparative evaluation of their respective performances. P-values are derived through the one-sided Mann-Whitney U test to assess the statistical significance of the accumulated gain between the mutator and non-mutator groups.

To depict the relationship between the median of accumulated gain and generation, Figure 1J was created. This line plot includes shaded regions to indicate the 95% confidence interval around the median for both mutator and non-mutator populations, as well as the overall dataset.

Yeast

To demonstrate the impact of increasing generations on accumulated gain in the Yeast experiment, Figure 2(B) employs a heatmap of accumulated gain. Rows denote various populations, columns represent generations, and the color gradient deepens with higher accumulated gain, visually highlighting the data trend.

To illustrate the correlation between accumulated gain and increasing generation in the Yeast experiment, Figure 2(C) was plotted. This figure shows a median line plot of accumulated gain against generation, with a shaded region indicating the 95% confidence interval.

Results

  1. Rnase III rnc gene
image
  1. Clinvar data WhatsApp Image 2023-06-22 at 09 42 04 WhatsApp Image 2023-06-22 at 09 42 05 WhatsApp Image 2023-06-22 at 09 42 07

Frequency of K-mer

WhatsApp Image 2023-06-22 at 11 11 06

Null model

The null model gives you the idea if the mutations was by random what impact you have on gain. We see if mutations was random most of the time the freq of kmer of alt and ref will be same. It is also supported from the skewed gaussian curve distribution of fcgr. Most of the kmer will be picked from near skewed gaussian, which have similar freq. Random mutation has less chance to give high gain.

WhatsApp Image 2023-06-22 at 18 26 56

E.coli LTTE:

image

The graphical representation in Figure 1A clearly indicates a significantly higher number of mutations in the mutator population compared to the non-mutator population.

Figure 1B provides insight into the median fitness trajectories of mutators and non-mutators over 50,000 generations. Notably, mutators exhibit a slightly greater Area Under the Curve (AUC) in their fitness trajectory compared to non-mutators.

Examining Figure 1C reveals distinct patterns in median LLR scores. As per Figure 1C, The mutator group consistently shows an upward trend in median LLR scores, while the non-mutator group exhibits a minor increase in median LLR scores towards the end of the trajectory, with substantial intermittent fluctuations. Mutators and non-mutators, when combined, show a steady increase in median LLR scores. Combining mutators and non-mutators reveals a steady rise in median LLR scores, primarily due to the significantly higher mutation frequency in mutators.

As shown in Figure 1D, the increase in median LLR score levels among non-mutators appears weaker as compared to mutators. The one-sided Mann-Whitney U test yielded p-values of 5.11e-16 and 1.96e-1 for the two halves of the median LLR score between mutator and non-mutator groups. This outcome distinctly indicates that the accumulated gain for generations <=30K and >30K is statistically significant in the mutator group (P<0.05). Conversely, the non-mutator group does not demonstrate a similar statistically different trend.

As depicted in Figure 1E, a robust correlation (0.728) was observed between median fitness and median LLR scores within mutator populations. However, this linear relationship was not evident for non-mutators, as illustrated in Figure 1F. In Figure 1G, where mutator populations predominantly contribute to overall mutation, the combined population also exhibits a substantial correlation (0.802) between median fitness and median LLR scores.

Figure 1H clearly shows that, without exception, all mutators consistently achieve a higher median accumulated gain score when compared to non-mutators. The one-sided p-value obtained from the Mann-Whitney U test is remarkably low at 4.06e-27, suggesting a highly significant difference in the accumulated gain between the mutator and non-mutator groups.

As shown in Figure 1J, the mutator group consistently increases its median accumulated gain, while the non-mutator group shows fluctuating scores. When mutators and non-mutators are combined, there is a steady rise in median accumulated gain. This overall increase is primarily driven by the higher mutation frequency in mutators.

Yeast LTTE

image

According to the observations in Figure 2(B), there is a general trend of increasing accumulated gain with each successive generation in the Yeast experiment.

In Figure 2(C), the line plot reveals that the median accumulated gain tends to increase as the generation advances in the Yeast experiment.