Skip to content

Commit

Permalink
readme and beagle
Browse files Browse the repository at this point in the history
  • Loading branch information
SamGurr committed Nov 22, 2024
1 parent 5e680cc commit ba24dd3
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions HPC_analysis/output/Popgen/angsd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ vcftools --vcf raw_snps.vcf.gz -- site-mean-depth --out mean_depth.txt

## Step 1: Filter by depth & quality

* **objective**: use ```--minDP``` and ```--minGQ``` in ```vcftools`` to filter genotypes with depth < 10 and genotype quality < 20, respectively
* **objective**: use ```--minDP``` and ```--minGQ``` in ```vcftools``` to filter genotypes with depth < 10 and genotype quality < 20, respectively

*..in R*
```
Expand All @@ -68,7 +68,7 @@ vcftools --gzvcf raw_snps.vcf.gz --out out.1 --minDP 10 --minGQ 20 --recode --re

## Step 2: Filter monomorphic sites

* **objective**: use ```--maf``` in ```vcftools`` to filter out sites that were made monomorphic by the previous filter
* **objective**: use ```--maf``` in ```vcftools``` to filter out sites that were made monomorphic by the previous filter

*..in R*
```
Expand All @@ -85,7 +85,7 @@ vcftools --vcf out.1.recode.vcf --maf 0.001 --out out.2 --recode --recode-INFO-a

## Step 3: Identify individuals with missing data

* **objective**: use ```--max-missing``` in ```vcftools`` to remove individuals with more than 50% missing data
* **objective**: use ```--max-missing``` in ```vcftools``` to remove individuals with more than 50% missing data

* think of % missingness as, "we are allowing a max missingess of XX %" therefore the smaller the number the more strict

Expand All @@ -102,7 +102,7 @@ vcftools --vcf out.2.recode.vcf --out out.3 --max-missing 0.5 --recode --recode-
```


* **objective**: use ```--missing-indv``` in ```vcftools`` to output an imiss file
* **objective**: use ```--missing-indv``` in ```vcftools``` to output an imiss file

*..in R*
```
Expand Down Expand Up @@ -152,7 +152,7 @@ write_delim(miss_70, "remove.3.inds", col_names = FALSE) # write the indivualds

## Step 4: Filter individuals with missing data

* **objective**: use ```--remove``` in ```vcftools`` to remove individuals with >70% missing data
* **objective**: use ```--remove``` in ```vcftools``` to remove individuals with >70% missing data

* navigate back into the bash sessionwith vcftools, create out.4 as the vcf witht he individuals removed;
**note**, 18 individuals were removed
Expand Down Expand Up @@ -215,7 +215,7 @@ vcftools --vcf out.4.vcf --out out.5 --exclude-positions remove.4.sites --recode

## Step 5: Investigate sites with missing data

* **objective**: use ```--max-missing``` in ```vcftools`` to remove sites with more than 75% missing data
* **objective**: use ```--max-missing``` in ```vcftools``` to remove sites with more than 75% missing data

*..in R*
```
Expand All @@ -238,7 +238,7 @@ vcftools --vcf out.4.recode.vcf --out out.5 --max-missing 0.75 --recode --recode

## Step 5.2: Build file in R to omit based on missingness

* **objective**: use ```--missing-indv``` in ```vcftools`` to calculate individual missingness - this is done again since the pool of sites is lower
* **objective**: use ```--missing-indv``` in ```vcftools``` to calculate individual missingness - this is done again since the pool of sites is lower

* **note** - this creates imiss and .kig files, does not overwrite the 5.recode

Expand Down Expand Up @@ -284,7 +284,7 @@ write_delim(miss_60, "remove.5.inds", col_names = FALSE) # write out

## Step 6: Filter individuals with missing data

* **objective**: use ```--remove``` in ```vcftools`` to remove the individuals with more than 60% missing genotype
* **objective**: use ```--remove``` in ```vcftools``` to remove the individuals with more than 60% missing genotype

*..in R*
```
Expand All @@ -304,7 +304,7 @@ vcftools --vcf out.5.recode.vcf --out out.6 --remove remove.5.inds --recode --re

## Step 6.2: Check for duplicates and relatedness filter

* **objective**: use ```--relatedness``` in ```vcftools`` to check for duplicate individuals using relatedness
* **objective**: use ```--relatedness``` in ```vcftools``` to check for duplicate individuals using relatedness


*..in R*
Expand Down

0 comments on commit ba24dd3

Please sign in to comment.