Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Breaking down the PCA user story #4

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

jerowe
Copy link

@jerowe jerowe commented Jul 19, 2020

@alimanfoo @dharhas @daletovar

I went through the tour of scikit allele blog post and broke it down.

This comes from the example notebook and the tour of scikit-allel blog post.

Here's an overview:

  1. Looked at distributions to give us an overview of the Variant Attributes in 001-Exploratory-Statistics-Variant-Attributes
  2. Looked at distributions to give us an overview of Variant Quality in 002-Exploratory-Statistics-Variant-Quality

Once we've done the initial exploratory analysis we want to start to remove variants that are of poor quality (or some other metric) and variants that don't give us any information on segrating our populations.

  1. Filtered variants based on Variant Attributes and Variant Quality in 003-Filter-Variants
  2. Filtered variants based on their Sample Missingingness, Sample Heterozygosity, and Variants that don't segregate our populations in 004-Exploratory-Statistics-Sample-QC.

All along the way, we have continually filtered our genotypes based on findings from the exploratory statistics and QC.

@daletovar
Copy link
Collaborator

Thanks for putting this together, @jerowe. Unless you have anything more that you'd like to add I think we can merge this.

@jeromekelleher
Copy link
Collaborator

This looks good to me, but I'd rather wait for @alimanfoo's input before merging. Is keeping this open for a while going to block people's progress?

@jerowe
Copy link
Author

jerowe commented Jul 22, 2020

@jeromekelleher no we're fine. This is a WIP.

@jerowe jerowe changed the title Breaking down the PCA user story WIP Breaking down the PCA user story Jul 22, 2020
@jerowe
Copy link
Author

jerowe commented Jul 30, 2020

Now that I have the Xarray format straight in my head I want to convert these to use the Genotype Call Dataset.

@jerowe jerowe marked this pull request as draft July 30, 2020 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants