Skip to content

Commit

Permalink
Merge pull request #118 from Roleren/master
Browse files Browse the repository at this point in the history
Added GTF support for refseq/genbank
  • Loading branch information
HajkD authored Feb 3, 2025
2 parents d930209 + 5d77549 commit 21aa0a2
Show file tree
Hide file tree
Showing 21 changed files with 92 additions and 25 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: biomartr
Title: Genomic Data Retrieval
Version: 1.0.9
Version: 1.0.10
Authors@R: c(person("Hajk-Georg", "Drost",
role = c("aut", "cre"),
email = "[email protected]",
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# [biomartr 1.0.10](https://github.com/ropensci/biomartr/releases/tag/v1.0.10)

### New features

#### GTF support for refseq and genbank
- Since refseq and genbank now supports gtf, we allow it in getGFF/getFFF

# [biomartr 1.0.9](https://github.com/ropensci/biomartr/releases/tag/v1.0.9)

### New features
Expand Down
12 changes: 8 additions & 4 deletions R/Refseq_Genbank_ftp_generics.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,11 @@ ftp_url_refseq_genbank <- function(assembly, type) {

if (type == "genome") {
url <- paste0(stem_url, "_genomic.fna.gz")
} else if (type == "gff") {
} else if (type %in% c("gff", "gff3")) {
url <- paste0(stem_url, "_genomic.gff.gz")
} else if (type == "cds") {
} else if (type == "gtf") {
url <- paste0(stem_url, "_genomic.gtf.gz")
}else if (type == "cds") {
url <- paste0(stem_url, "_cds_from_genomic.fna.gz")
} else if (type == "rna") {
url <- paste0(stem_url, "_rna_from_genomic.fna.gz")
Expand All @@ -106,9 +108,11 @@ local_path_refseq_genbank <- function(path, local.org, db, type) {

if (type == "genome") {
local_file <- paste0(local_file, "_genomic_", db, ".fna.gz")
} else if (type == "gff") {
} else if (type %in% c("gff", "gff3")) {
local_file <- paste0(local_file, "_genomic_", db, ".gff.gz")
} else if (type == "cds") {
} else if (type == "gtf") {
local_file <- paste0(local_file, "_genomic_", db, ".gtf.gz")
}else if (type == "cds") {
local_file <- paste0(local_file, "_cds_from_genomic_", db, ".fna.gz")
} else if (type == "rna") {
local_file <- paste0(local_file, "_rna_from_genomic_", db, ".fna.gz")
Expand Down
4 changes: 3 additions & 1 deletion R/getBioSet.R
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,9 @@ getBioSet <- function(db = "refseq",
#' }
#' @param type biological sequence type. (alternatives are: genome, gff, cds,
#' rna, proteome, assembly_stats, repeat_masker, collection (all the others))
#' @param reference a logical value indicating whether or not a genome shall be downloaded if it isn't marked in the database as either a reference genome or a representative genome.
#' @param reference a logical value indicating whether or not a genome shall be a candidate for downloaded
#' if it isn't marked in the database as either a reference genome or a representative genome.
#' This is helpful if you don't want to allow "partial genomes" etc.
#' @param release a numeric, the database release version of ENSEMBL (\code{db = "ensembl"}). Default is \code{release = NULL} meaning
#' that the most recent database version is used. \code{release = 75} would for human would give the stable
#' GRCh37 release in ensembl. Value must be > 46, since ensembl did not structure their data
Expand Down
12 changes: 7 additions & 5 deletions R/getGFF.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ getGFF <- function(db = "refseq", organism, reference = FALSE,

if (is.element(db, c("refseq", "genbank"))) {
info <- get_file_refseq_genbank(db, organism, reference, skip_bacteria,
release, gunzip, path, type = "gff")
release, gunzip, path, type = format)
return(refseq_genbank_download_post_processing(info, organism, db, path,
gunzip,
remove_annotation_outliers,
Expand Down Expand Up @@ -103,15 +103,17 @@ getGFF <- function(db = "refseq", organism, reference = FALSE,
#' @export
getGTF <-
function(db = "ensembl",
organism,
organism, reference = FALSE,
remove_annotation_outliers = FALSE,
path = file.path("ensembl", "annotation"),
release = NULL,
mute_citation = FALSE) {
if (!is.element(db, c("ensembl")))
stop( "Please select one of the available data bases: db = 'ensembl'.", call. = FALSE)
getGFF(db = "ensembl",
if (!is.element(db, c("ensembl", "refseq", "genbank")))
stop( "Please select one of the available data bases: db = 'ensembl',",
"'refseq', 'genbank'.", call. = FALSE)
getGFF(db = db,
organism,
reference = reference,
remove_annotation_outliers = remove_annotation_outliers,
path = path,
release = release,
Expand Down
4 changes: 3 additions & 1 deletion man/getBio.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getBioSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getCDS.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getCDSSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getCollection.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getCollectionSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion man/getGFF.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getGFFSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/getGTF.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getGenome.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getGenomeSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getProteome.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getProteomeSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getRNA.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/getRNASet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions tests/testthat/test-getGFF.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,22 @@ test_that("The getGFF() interface works properly for NCBI Genbank (repeating com
expect_equal(out1, out2)
})

test_that("The getGFF() interface works properly for NCBI RefSeq (GTF format)",{

skip_on_cran()
skip_on_travis()
# test proper download from refseq
out1 <- getGFF( db = "refseq",
organism = "Saccharomyces cerevisiae",
path = tempdir(), mute_citation = TRUE, format = "gtf")

out2 <- getGFF( db = "refseq",
organism = "Saccharomyces cerevisiae",
path = tempdir(), mute_citation = TRUE, format = "gtf")
expect_equal(out1, out2)
})



test_that("The getGFF() interface works properly for Ensembl (repeating command)",{
skip_on_cran()
Expand Down

0 comments on commit 21aa0a2

Please sign in to comment.