-
Notifications
You must be signed in to change notification settings - Fork 1
Algorithm
abhi1238 edited this page May 18, 2023
·
18 revisions
- We installed a library (kaos)
- Got 1000 references and 1000 AMR e. coli. genomes (.fa)
- For each of the .fa files, we do the following
- We read one .fa file at a time and replace anything other than A, T, G, and C with a random 20-length string of ATGC (alternatively, you can try replacing with one random character)
- Create FCGR for a specific K (you may consider normalizing by N-K+1 and see if that makes sense)
- Sum all FCGRs across genomes of one of the Ref or AMR categories
- Try to plot the vector form of the matrices as a scatter between the two categories
- Install kaos libary (https://cran.r-project.org/web/packages/kaos/index.html)
- Read 150 fasta sample from each category ( Susceptible, AMR, and Reference).
- Find the FCGR matrix for each sample and normalise to 1.
- Find the average of all the sample for a given category ( Susceptible, AMR, and Reference).
- Plot probability of AMR vs ref, Susceptible vs ref and AMR vs Susceptible for different kmer length (Inter group).
- Plot probability of AMR vs AMR, Susceptible vs Susceptible and Ref vs Ref for different kmer length (Intra group)
- Install kaos libary (https://cran.r-project.org/web/packages/kaos/index.html)
- Read 10 fasta sample from each category ( Susceptible, AMR, and Reference).
- Find the FCGR matrix for each sample(do not normalise to 1).
- Find the average of all the sample for a given category ( Susceptible, AMR, and Reference).
- Plot top 100 kmer who has highest absolute difference between AMR vs Susceptible frequency (for varying kmer length).
- Chaos game and genome
- First paper - https://academic.oup.com/nar/article-abstract/18/8/2163/2383530?redirectedFrom=fulltext
- blog - https://towardsdatascience.com/chaos-game-representation-of-a-genetic-sequence-4681f1a67e14
- Bioinfo paper (2001) - https://academic.oup.com/bioinformatics/article/17/5/429/277428?login=true
- Kaos R package paper (2019) - https://academic.oup.com/bioinformatics/article/36/1/272/5521624
- Nice review (2021) - https://www.sciencedirect.com/science/article/pii/S2001037021004736
- Data structure chaos game - https://almob.biomedcentral.com/articles/10.1186/1748-7188-7-10
- Alignment free genome comparison wiki page (includes info theory stuff) - https://en.wikipedia.org/wiki/Alignment-free_sequence_analysis#cite_note-52
- Excellent application in AMR - https://academic.oup.com/bioinformatics/article/38/2/325/6382301
- Global alignment - https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-243