From 237ec73cc24b3ff7d89d410e6ae686f7e567aaaa Mon Sep 17 00:00:00 2001
From: wletsou <104658829+wletsou@users.noreply.github.com>
Date: Wed, 26 Apr 2023 18:52:17 -0400
Subject: [PATCH] Update week2.md
---
_pages/week2.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/_pages/week2.md b/_pages/week2.md
index 5bbf76ad23b2..b83a17574737 100644
--- a/_pages/week2.md
+++ b/_pages/week2.md
@@ -237,9 +237,9 @@ From (3a)–(3c), the fraction of the genome shared IBD is\\[r=\frac{2\pi_2+
#### KING ####
-KING computes both the probability \\(\pi_0\\) that two relatives share 0 alleles IBD as well as the coefficient of relatedness \\(\phi=\frac{r}{2}\\), defined as the probability that two alleles taken one from each relative are IBD at a locus (the maximum probability is \\(\frac{1}{2}\\) because there is a 50% chance that the alleles chosen come from different parents). The idea is to compare the counts \\(X\\) and \\(Y\\) of the alternative alleles that two individuals each have at a genetic locus. If the two individuals are from the same population, the expected values and variances of the allele counts are \\(\mathbb{E}\left(X\right)=\mathbb{E}\left(Y\right)=2p\\) and \\(\sigma_X^2=\mathbb{E}\left(X^2\right)-\mathbb{E}\left(X\right)^2=\\(\mathbb{E}\left(Y^2\right)-\mathbb{E}\left(Y\right)^2=2p\left(1-p\right)\\). Thus the expected value of the difference \\(\mathbb{E}\left(X^2-Y^2\right)=\mathbb{E}\left(X^2\right)+\mathbb{E}\left(Y^2\right)-2\mathbb{E}\left(XY\right)\\) is\\[\frac{\mathbb{E}\left(X^2-Y^2\right)}{\sigma_X^2+\sigma_Y^2}=1-\frac{\sigma_{XY}}{\sigma_X\sigma_Y}=1-r,\tag{5}\\]where \\(\sigma_{XY}=\mathbb{E}\left(XY\right)-\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right)\\) is the covariance of the genotype counts and \\(r=2\phi\\) is the genetic correlation between two individuals, also interprettted as the amount of the genome shared IBD.
+KING computes both the probability \\(\pi_0\\) that two relatives share 0 alleles IBD as well as the coefficient of relatedness \\(\phi=\frac{r}{2}\\), defined as the probability that two alleles taken one from each relative are IBD at a locus (the maximum probability is \\(\frac{1}{2}\\) because there is a 50% chance that the alleles chosen come from different parents). The idea is to compare the counts \\(X\\) and \\(Y\\) of the alternative alleles which two individuals each have at a genetic locus. If pair are from a single ancestral population, the expected values and variances of the allele counts are \\(\mathbb{E}\left(X\right)=\mathbb{E}\left(Y\right)=2p\\) and \\(\sigma_X^2=\mathbb{E}\left(X^2\right)-\mathbb{E}\left(X\right)^2=\mathbb{E}\left(Y^2\right)-\mathbb{E}\left(Y\right)^2=2p\left(1-p\right)\\). Thus the expected value of the difference \\(\mathbb{E}\left(X^2-Y^2\right)=\mathbb{E}\left(X^2\right)+\mathbb{E}\left(Y^2\right)-2\mathbb{E}\left(XY\right)\\) is\\[\frac{\mathbb{E}\left(X^2-Y^2\right)}{\sigma_X^2+\sigma_Y^2}=1-\frac{\sigma_{XY}}{\sigma_X\sigma_Y}=1-r,\tag{5}\\]where \\(\sigma_{XY}=\mathbb{E}\left(XY\right)-\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right)\\) is the covariance of the genotype counts and \\(r=2\phi\\) is the genetic correlation between two individuals; the latter can be interprettted as the amount of the genome shared IBD.
-KING estimates \\(\phi\\) from Eq. (5) by counting the number \\(N\\) of loci at which two individuals are heterozygous \\(Aa,Aa\\) or opposite homozygous \\(AA,aa\\), as well as the total number of alleles at which each individual is heterozygous \\(Aa\\), using the equation:\\[\hat{\phi_{ij}}=\frac{N_{Aa,Aa}-2N_{AA,aa}}{N_{Aa}^{\left(i\right)}+N_{Aa}^{\left(j\right)}}.\tag{6}\\]From (6) it can be seen that shared heterozygous sites increase the estimated relatedness, and that unshared homozygous sites decrease relatedness. Eq. (6) is called a "robust" estimator because it measures relatedness in a purely pairwise fashion; it does not rely on population estimates of allele frequencies. However, if the individuals are not of the same genetic background, the allele frequency \\(p\\) is not well-defined and Eq. (5) does not hold, leading to negative estimates of \\(\phi\\); this feature is not necessarily a problem, as it helps us to distinguish different ancestries within a single population.
+KING estimates \\(\phi\\) from Eq. (5) by counting the number \\(N\\) of loci at which two individuals are heterozygous \\(Aa,Aa\\) or opposite homozygous \\(AA,aa\\), as well as the total number of alleles at which each individual is heterozygous \\(Aa\\):\\[\hat{\phi_{ij}}=\frac{N_{Aa,Aa}-2N_{AA,aa}}{N_{Aa}^{\left(i\right)}+N_{Aa}^{\left(j\right)}}.\tag{6}\\]From (6) it can be seen that shared heterozygous sites increase the estimated relatedness, and that unshared homozygous sites decrease relatedness. Eq. (6) is called a "robust" estimator because it measures relatedness in a purely pairwise fashion: it does not rely on population estimates of allele frequencies. However, if the individuals are not of the same genetic background, the allele frequency \\(p\\) is not well-defined and Eq. (5) does not hold, leading to negative estimates of \\(\phi\\); this feature is not necessarily a problem, as it helps us to distinguish different ancestries within a single population.
To run KING we need only a gds object and a set of SNPs. We will use the LD-pruned set pruned we computed above and the genofile containing simulated haplotypes from CHB, YRI, and CEU individuals. Running