Update week2.md

wletsou · Apr 26, 2023 · 531e38c · 531e38c
1 parent a40fad9
commit 531e38c
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/_pages/week2.md b/_pages/week2.md
@@ -237,9 +237,9 @@ From (3a)&ndash;(3c), the fraction of the genome shared IBD is\\[r=\frac{2\pi_2+
 
 #### KING ####
 
-KING computes both the probability \\(\pi_0\\) that two relatives share 0 alleles IBD as well as the coefficient of relatedness \\(\phi=\frac{r}{2}\\), defined as the probability that two alleles taken one from each relative are IBD at a locus (the maximum probability is \\(\frac{1}{2}\\) because there is a 50% chance that the alleles chosen come from different parents).&nbsp; The idea is to compare the counts \\(X\\) and \\(Y\\) of the alternative alleles that two individuals each have at a genetic locus.&nbsp; If the two individuals are from the same population, the allele expected counts and variances are \\(\mathbb{E}\left(X\right)=\mathbb{E}\left(Y\right)=2p\\) and \\(\sigma_X=\mathbb{E}\left(X^2\right)-\mathbb{E}\left(X\right)^2=\\(\mathbb{E}\left(Y^2\right)-\mathbb{E}\left(Y\right)^2=2p\left(1-p\right)\\).&nbsp; Thus the expected value of the difference \\(\mathbb{E}\left(X^2-Y^2\right)=\mathbb{E}\left(X^2\right)+\mathbb{E}\left(Y^2\right)-2\mathbb{E}\left(XY\right)\\) is \\[\frac{\mathbb{E}\left(X^2-Y^2\right)}{\sigma_X^2+\sigma_Y^2}=1-\frac{\sigma_{XY}}{\sigma_X\simga_Y}=1-r,\tag{5}\\]where \\(\sigma_{XY}=\mathbb{E}\left(XY\right)-\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right)\\) is the covariance of the genotype counts and \\(r=2\phi\\) is the genetic correlation between two individuals, also interprettted as the amount of the genome they share IBD.
+KING computes both the probability \\(\pi_0\\) that two relatives share 0 alleles IBD as well as the coefficient of relatedness \\(\phi=\frac{r}{2}\\), defined as the probability that two alleles taken one from each relative are IBD at a locus (the maximum probability is \\(\frac{1}{2}\\) because there is a 50% chance that the alleles chosen come from different parents).&nbsp; The idea is to compare the counts \\(X\\) and \\(Y\\) of the alternative alleles that two individuals each have at a genetic locus.&nbsp; If the two individuals are from the same population, the expected values and variances of the allele counts are \\(\mathbb{E}\left(X\right)=\mathbb{E}\left(Y\right)=2p\\) and \\(\sigma^2_X=\mathbb{E}\left(X^2\right)-\mathbb{E}\left(X\right)^2=\\(\mathbb{E}\left(Y^2\right)-\mathbb{E}\left(Y\right)^2=2p\left(1-p\right)\\).&nbsp; Thus the expected value of the difference \\(\mathbb{E}\left(X^2-Y^2\right)=\mathbb{E}\left(X^2\right)+\mathbb{E}\left(Y^2\right)-2\mathbb{E}\left(XY\right)\\) is\\[\frac{\mathbb{E}\left(X^2-Y^2\right)}{\sigma_X^2+\sigma_Y^2}=1-\frac{\sigma_{XY}}{\sigma_X\sigma_Y}=1-r,\tag{5}\\]where \\(\sigma_{XY}=\mathbb{E}\left(XY\right)-\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right)\\) is the covariance of the genotype counts and \\(r=2\phi\\) is the genetic correlation between two individuals, also interprettted as the amount of the genome shared IBD.
 
-KING estimates \\(\phi\\) from \\(\mathbb{E}\left(X^2-Y^2\right)\\) by counting the number \\(N\\) of loci at which two individuals are heterozygous \\(Aa,Aa\\) or opposite homozygous \\(AA,aa\\) and the total number of alleles at which each individual is heterozygous \\(Aa\\) using:\\[\hat{\phi_{ij}}=\frac{N_{Aa,Aa}-2N_{AA,aa}}{N_{Aa}^{\left(i\right)}+N_{Aa}^{\left(j\right)}}\tag{6}\\]From (6) it can be seen that shared heterozygous sites increase the estimated relatedness and unshared homozygous sites decrease relatedness.&nbsp;  Eq. (6) is called a "robust" estimator because it measures relatedness in a purely pairwise fashion; it does not rely on population estimates of allele frequencies.&nbsp; However, if the individuals are not of the same genetic background, the allele frequency \\(p\\) is not well-defined at Eq. (5) does not hold, leading Eq. (6) to produce negative estimates; this feature is not a problem, as it helps us to distinguish different ancestries within a single population.
+KING estimates \\(\phi\\) from Eq. (5) by counting the number \\(N\\) of loci at which two individuals are heterozygous \\(Aa,Aa\\) or opposite homozygous \\(AA,aa\\), as well as the total number of alleles at which each individual is heterozygous \\(Aa\\), using the equation:\\[\hat{\phi_{ij}}=\frac{N_{Aa,Aa}-2N_{AA,aa}}{N_{Aa}^{\left(i\right)}+N_{Aa}^{\left(j\right)}}.\tag{6}\\]From (6) it can be seen that shared heterozygous sites increase the estimated relatedness, and that unshared homozygous sites decrease relatedness.&nbsp;  Eq. (6) is called a "robust" estimator because it measures relatedness in a purely pairwise fashion; it does not rely on population estimates of allele frequencies.&nbsp; However, if the individuals are not of the same genetic background, the allele frequency \\(p\\) is not well-defined and Eq. (5) does not hold, leading to negative estimates of \\(\phi\\); this feature is not necessarily a problem, as it helps us to distinguish different ancestries within a single population.
 
 To run KING we need only a gds object and a set of SNPs.&nbsp;  We will use the LD-pruned set <kbd>pruned</kbd> we computed above and the <kbd>genofile</kbd> containing simulated haplotypes from CHB, YRI, and CEU individuals.&nbsp;  Running