-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contigs missing #22
Comments
UCSC is really annoying since it doesn't have coherent releases (e.g., with a release number). If you know what should be matched together then please submit a PR. |
hey @dpryan79, what is your approach to identifying which contig from UCSC matches that in Ensembl? |
NCBI hosts a file that has a variety of chromosome naming system suggestions, so I use that and compare the chromosome lengths to ensure they match. |
Note that there are patch contigs added over time, so these have to be updated every year or two. |
The latest UCSC patch is patch 12. Does the NCBI file contain the contigs from these patches? |
Quite likely, yes. |
You may want to check out the As an aside, @dpryan79 any interest in a second repo with |
@nh13 Either side-by-side or a subdirectory would work IMO. |
The GRCh38_UCSC2ensembl.txt file is missing contig mapping from the hg38 UCSC side. In using this file to remap UCSC contigs to Ensembl the map fails because of missing contigs.
For example,
chr10_KN196480v1_fix
,chr10_KQ090021v1_fix
,chr11_KN196481v1_fix
, etc. are all within the file being remapped, but these contigs are not in GRCh38_UCSC2ensembl.txt.I am unaware of other files that may be missing updated contigs, but there may be a few.
Could you update the GRCh38_UCSC2ensembl.txt file, and potentially other files that are missing updated contigs?
The text was updated successfully, but these errors were encountered: