diff --git a/_pages/week2.md b/_pages/week2.md index 3244c5945251..5bbf76ad23b2 100644 --- a/_pages/week2.md +++ b/_pages/week2.md @@ -237,7 +237,7 @@ From (3a)–(3c), the fraction of the genome shared IBD is\\[r=\frac{2\pi_2+ #### KING #### -KING computes both the probability \\(\pi_0\\) that two relatives share 0 alleles IBD as well as the coefficient of relatedness \\(\phi=\frac{r}{2}\\), defined as the probability that two alleles taken one from each relative are IBD at a locus (the maximum probability is \\(\frac{1}{2}\\) because there is a 50% chance that the alleles chosen come from different parents).  The idea is to compare the counts \\(X\\) and \\(Y\\) of the alternative alleles that two individuals each have at a genetic locus.  If the two individuals are from the same population, the expected values and variances of the allele counts are \\(\mathbb{E}\left(X\right)=\mathbb{E}\left(Y\right)=2p\\) and \\(\sigma^2_X=\mathbb{E}\left(X^2\right)-\mathbb{E}\left(X\right)^2=\\(\mathbb{E}\left(Y^2\right)-\mathbb{E}\left(Y\right)^2=2p\left(1-p\right)\\).  Thus the expected value of the difference \\(\mathbb{E}\left(X^2-Y^2\right)=\mathbb{E}\left(X^2\right)+\mathbb{E}\left(Y^2\right)-2\mathbb{E}\left(XY\right)\\) is\\[\frac{\mathbb{E}\left(X^2-Y^2\right)}{\sigma_X^2+\sigma_Y^2}=1-\frac{\sigma_{XY}}{\sigma_X\sigma_Y}=1-r,\tag{5}\\]where \\(\sigma_{XY}=\mathbb{E}\left(XY\right)-\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right)\\) is the covariance of the genotype counts and \\(r=2\phi\\) is the genetic correlation between two individuals, also interprettted as the amount of the genome shared IBD. +KING computes both the probability \\(\pi_0\\) that two relatives share 0 alleles IBD as well as the coefficient of relatedness \\(\phi=\frac{r}{2}\\), defined as the probability that two alleles taken one from each relative are IBD at a locus (the maximum probability is \\(\frac{1}{2}\\) because there is a 50% chance that the alleles chosen come from different parents).  The idea is to compare the counts \\(X\\) and \\(Y\\) of the alternative alleles that two individuals each have at a genetic locus.  If the two individuals are from the same population, the expected values and variances of the allele counts are \\(\mathbb{E}\left(X\right)=\mathbb{E}\left(Y\right)=2p\\) and \\(\sigma_X^2=\mathbb{E}\left(X^2\right)-\mathbb{E}\left(X\right)^2=\\(\mathbb{E}\left(Y^2\right)-\mathbb{E}\left(Y\right)^2=2p\left(1-p\right)\\).  Thus the expected value of the difference \\(\mathbb{E}\left(X^2-Y^2\right)=\mathbb{E}\left(X^2\right)+\mathbb{E}\left(Y^2\right)-2\mathbb{E}\left(XY\right)\\) is\\[\frac{\mathbb{E}\left(X^2-Y^2\right)}{\sigma_X^2+\sigma_Y^2}=1-\frac{\sigma_{XY}}{\sigma_X\sigma_Y}=1-r,\tag{5}\\]where \\(\sigma_{XY}=\mathbb{E}\left(XY\right)-\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right)\\) is the covariance of the genotype counts and \\(r=2\phi\\) is the genetic correlation between two individuals, also interprettted as the amount of the genome shared IBD. KING estimates \\(\phi\\) from Eq. (5) by counting the number \\(N\\) of loci at which two individuals are heterozygous \\(Aa,Aa\\) or opposite homozygous \\(AA,aa\\), as well as the total number of alleles at which each individual is heterozygous \\(Aa\\), using the equation:\\[\hat{\phi_{ij}}=\frac{N_{Aa,Aa}-2N_{AA,aa}}{N_{Aa}^{\left(i\right)}+N_{Aa}^{\left(j\right)}}.\tag{6}\\]From (6) it can be seen that shared heterozygous sites increase the estimated relatedness, and that unshared homozygous sites decrease relatedness.  Eq. (6) is called a "robust" estimator because it measures relatedness in a purely pairwise fashion; it does not rely on population estimates of allele frequencies.  However, if the individuals are not of the same genetic background, the allele frequency \\(p\\) is not well-defined and Eq. (5) does not hold, leading to negative estimates of \\(\phi\\); this feature is not necessarily a problem, as it helps us to distinguish different ancestries within a single population.