Skip to content

Algorithm

abhi1238 edited this page May 18, 2023 · 18 revisions

Chaos Game Frequency Matrix Representation

  • We installed a library (kaos)
  • Got 1000 references and 1000 AMR e. coli. genomes (.fa)
  • For each of the .fa files, we do the following
  • We read one .fa file at a time and replace anything other than A, T, G, and C with a random 20-length string of ATGC (alternatively, you can try replacing with one random character)
  • Create FCGR for a specific K (you may consider normalizing by N-K+1 and see if that makes sense)
  • Sum all FCGRs across genomes of one of the Ref or AMR categories
  • Try to plot the vector form of the matrices as a scatter between the two categories

Chaos game plot probability for different kmer length against different class

  • Install kaos libary (https://cran.r-project.org/web/packages/kaos/index.html)
  • Read 150 fasta sample from each category ( Susceptible, AMR, and Reference).
  • Find the FCGR matrix for each sample and normalise to 1.
  • Find the average of all the sample for a given category ( Susceptible, AMR, and Reference).
  • Plot probability of AMR vs ref, Susceptible vs ref and AMR vs Susceptible for different kmer length (Inter group).
  • Plot probability of AMR vs AMR, Susceptible vs Susceptible and Ref vs Ref for different kmer length (Intra group)

WhatsApp Image 2023-05-16 at 12 28 08

Chaos game plot absolute difference of different kmer against different class with 10 samples

  • Install kaos libary (https://cran.r-project.org/web/packages/kaos/index.html)
  • Read 10 fasta sample from each category ( Susceptible, AMR, and Reference).
  • Find the FCGR matrix for each sample(do not normalise to 1).
  • Find the average of all the sample for a given category ( Susceptible, AMR, and Reference).
  • Plot top 100 kmer who has highest absolute difference between AMR vs Susceptible frequency (for varying kmer length).

WhatsApp Image 2023-05-18 at 14 12 30

Materials

  1. Chaos game and genome
  2. First paper - https://academic.oup.com/nar/article-abstract/18/8/2163/2383530?redirectedFrom=fulltext
  3. blog - https://towardsdatascience.com/chaos-game-representation-of-a-genetic-sequence-4681f1a67e14
  4. Bioinfo paper (2001) - https://academic.oup.com/bioinformatics/article/17/5/429/277428?login=true
  5. Kaos R package paper (2019) - https://academic.oup.com/bioinformatics/article/36/1/272/5521624
  6. Nice review (2021) - https://www.sciencedirect.com/science/article/pii/S2001037021004736
  7. Data structure chaos game - https://almob.biomedcentral.com/articles/10.1186/1748-7188-7-10
  8. Alignment free genome comparison wiki page (includes info theory stuff) - https://en.wikipedia.org/wiki/Alignment-free_sequence_analysis#cite_note-52
  9. Excellent application in AMR - https://academic.oup.com/bioinformatics/article/38/2/325/6382301
  10. Global alignment - https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-243