Skip to content

Commit

Permalink
Update to MSigDB 7.2
Browse files Browse the repository at this point in the history
  • Loading branch information
igordot committed Oct 2, 2020
1 parent c1d62bc commit 8452596
Show file tree
Hide file tree
Showing 12 changed files with 197 additions and 40 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: msigdbr
Type: Package
Title: MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format
Version: 7.1.1.9001
Version: 7.2.1
Authors@R: person("Igor", "Dolgalev", email = "[email protected]", role = c("aut", "cre"))
Description: Provides the 'Molecular Signatures Database' (MSigDB) gene sets
typically used with the 'Gene Set Enrichment Analysis' (GSEA) software
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# msigdbr (development version)
# msigdbr 7.2.1

* Based on MSigDB v7.2 release.
* Added more annotation fields to the returned gene sets.
* Added `msigdbr_species()` as an alternative to `msigdbr_show_species()`.
* Added `msigdbr_collections()`.
Expand Down
16 changes: 13 additions & 3 deletions R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,15 +92,25 @@ msigdbr <- function(species = "Homo sapiens", category = NULL, subcategory = NUL
if (length(category) > 1) {
stop("please specify only one category at a time")
}
genesets_subset <- filter(genesets_subset, .data$gs_cat == category)
if (category %in% genesets_subset$gs_cat) {
genesets_subset <- filter(genesets_subset, .data$gs_cat == category)
} else {
stop("unknown category")
}
}

# filter by sub-category
# filter by sub-category (with and without colon)
if (is.character(subcategory)) {
if (length(subcategory) > 1) {
stop("please specify only one subcategory at a time")
}
genesets_subset <- filter(genesets_subset, .data$gs_subcat == subcategory)
if (subcategory %in% genesets_subset$gs_subcat) {
genesets_subset <- filter(genesets_subset, .data$gs_subcat == subcategory)
} else if (subcategory %in% gsub(".*:", "", genesets_subset$gs_subcat)){
genesets_subset <- filter(genesets_subset, gsub(".*:", "", .data$gs_subcat) == subcategory)
} else {
stop("unknown subcategory")
}
}

# combine gene sets and genes
Expand Down
Binary file modified R/sysdata.rda
Binary file not shown.
16 changes: 5 additions & 11 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Test environments
* local R installation, R 4.0.0
* ubuntu 16.04 (on travis-ci), R 4.0.0
* local R installation, R 4.0.2
* ubuntu 16.04 (on travis-ci), R 4.0.2
* win-builder (devel)

## R CMD check results
Expand All @@ -9,13 +9,7 @@

## revdepcheck results

We checked 5 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.
We checked 7 reverse dependencies (3 from CRAN + 4 from BioConductor), comparing R CMD check results across CRAN and dev versions of this package.

* We saw 0 new problems
* We failed to check 0 packages

## Resubmission

This is a resubmission. In this version I have:

* Addressed the elapsed time notes.
* We saw 0 new problems
* We failed to check 0 packages
19 changes: 10 additions & 9 deletions data-raw/msigdbr-prepare.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ library(usethis)
# Import MSigDB gene sets -------------------------------------------------

# Define MSigDB download variables
msigdb_version = "7.1"
msigdb_version = "7.2"
msigdb_url_base = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release"
msigdb_zip_url = glue("{msigdb_url_base}/{msigdb_version}/msigdb_v{msigdb_version}_files_to_download_locally.zip")
msigdb_dir = glue("msigdb_v{msigdb_version}_files_to_download_locally")
Expand Down Expand Up @@ -96,6 +96,7 @@ human_tbl =
distinct() %>%
mutate(
species_name = "Homo sapiens",
species_common_name = "human",
entrez_gene = human_entrez_gene,
gene_symbol = human_gene_symbol
)
Expand Down Expand Up @@ -130,7 +131,7 @@ msigdbr_orthologs =
species_id = ortholog_species,
entrez_gene = ortholog_species_entrez_gene,
gene_symbol = ortholog_species_symbol,
sources = support
ortholog_sources = support
) %>%
filter(
human_entrez_gene != "-",
Expand All @@ -140,15 +141,15 @@ msigdbr_orthologs =
mutate(
human_entrez_gene = as.integer(human_entrez_gene),
entrez_gene = as.integer(entrez_gene),
num_sources = str_count(sources, ",") + 1
num_ortholog_sources = str_count(ortholog_sources, ",") + 1
) %>%
filter(
human_entrez_gene %in% msigdb_entrez_genes,
num_sources > 2
num_ortholog_sources > 2
)

# List the number of supporting sources
table(msigdbr_orthologs$num_sources, useNA = "ifany")
table(msigdbr_orthologs$num_ortholog_sources, useNA = "ifany")

# Names and IDs of common species
species_tbl =
Expand All @@ -173,9 +174,8 @@ msigdbr_orthologs = inner_join(species_tbl, msigdbr_orthologs, by = "species_id"
# For each human gene, only keep the best ortholog (found in the most databases)
msigdbr_orthologs =
msigdbr_orthologs %>%
select(-species_id) %>%
group_by(human_entrez_gene, species_name) %>%
top_n(1, num_sources) %>%
top_n(1, num_ortholog_sources) %>%
ungroup()

# For each human gene, ignore ortholog pairs with many orthologs
Expand All @@ -191,7 +191,8 @@ msigdbr_orthologs =
bind_rows(human_tbl) %>%
select(
human_entrez_gene, human_gene_symbol,
species_name, species_common_name, entrez_gene, gene_symbol, sources, num_sources
species_name, species_common_name, entrez_gene, gene_symbol,
ortholog_sources, num_ortholog_sources
) %>%
arrange(human_gene_symbol, human_entrez_gene, species_name) %>%
distinct()
Expand All @@ -203,7 +204,7 @@ hcop %>%
summarize(n_distinct(ortholog_species_symbol))
msigdbr_orthologs %>%
group_by(species_name) %>%
summarize(n_distinct(human_gene_symbol), n_distinct(gene_symbol), max(num_sources))
summarize(n_distinct(human_gene_symbol), n_distinct(gene_symbol), max(num_ortholog_sources))

# Prepare package ---------------------------------------------------------

Expand Down
24 changes: 17 additions & 7 deletions revdep/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,33 @@

|field |value |
|:--------|:----------------------------|
|version |R version 4.0.0 (2020-04-24) |
|version |R version 4.0.2 (2020-06-22) |
|os |macOS Mojave 10.14.6 |
|system |x86_64, darwin17.0 |
|ui |RStudio |
|language |(EN) |
|collate |en_US.UTF-8 |
|ctype |en_US.UTF-8 |
|tz |America/New_York |
|date |2020-05-12 |
|date |2020-10-02 |

# Dependencies

|package |old |new |Δ |
|:----------|:-----|:-----|:--|
|msigdbr |7.0.1 |7.1.1 |* |
|tidyselect |NA |1.0.0 |* |
|vctrs |NA |0.2.4 |* |
|package |old |new |Δ |
|:-------|:-----|:-----|:--|
|msigdbr |7.1.1 |7.2.1 |* |

# Revdeps

## Failed to check (1)

|package |version |error |warning |note |
|:--------|:-------|:-----|:-------|:----|
|tidybulk |? | | | |

## New problems (1)

|package |version |error |warning |note |
|:--------------------------|:-------|:-----|:-------|:----|
|[hypeR](problems.md#hyper) |1.4.0 | |__+1__ |2 |

7 changes: 7 additions & 0 deletions revdep/cran.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## revdepcheck results

We checked 7 reverse dependencies (3 from CRAN + 4 from BioConductor), comparing R CMD check results across CRAN and dev versions of this package.

* We saw 0 new problems
* We failed to check 0 packages

46 changes: 45 additions & 1 deletion revdep/failures.md
Original file line number Diff line number Diff line change
@@ -1 +1,45 @@
*Wow, no problems at all. :)*
# tidybulk

<details>

* Version:
* GitHub: https://github.com/igordot/msigdbr
* Source code: NA
* Number of recursive dependencies: 0

</details>

## Error before installation

### Devel

```
There is a binary version available but the source version is later:
binary source needs_compilation
RSQLite 2.2.0 2.2.1 TRUE
Binaries will be installed
installing the source packages ‘EGSEAdata’, ‘hgu133a.db’, ‘hgu133plus2.db’, ‘KEGGdzPathwaysGEO’, ‘org.Mm.eg.db’, ‘org.Rn.eg.db’
```
### CRAN

```
There is a binary version available but the source version is later:
binary source needs_compilation
RSQLite 2.2.0 2.2.1 TRUE
Binaries will be installed
installing the source packages ‘EGSEAdata’, ‘hgu133a.db’, ‘hgu133plus2.db’, ‘KEGGdzPathwaysGEO’, ‘org.Mm.eg.db’, ‘org.Rn.eg.db’
```
69 changes: 68 additions & 1 deletion revdep/problems.md
Original file line number Diff line number Diff line change
@@ -1 +1,68 @@
*Wow, no problems at all. :)*
# hypeR

<details>

* Version: 1.4.0
* GitHub: https://github.com/montilab/hypeR
* Source code: https://github.com/cran/hypeR
* Date/Publication: 2020-04-27
* Number of recursive dependencies: 111

Run `revdep_details(, "hypeR")` for more info

</details>

## Newly broken

* checking examples ... WARNING
```
Found the following significant warnings:
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Warning: 'msigdbr::msigdbr_show_species' is deprecated.
Deprecated functions may be defunct as soon as of the next release of
R.
See ?Deprecated.
```
## In both
* checking R code for possible problems ... NOTE
```
...
‘is’
hyp_to_table: no visible global function definition for ‘is’
hyp_to_table: no visible global function definition for
‘packageVersion’
hyp_to_table: no visible global function definition for ‘write.table’
hypeR: no visible global function definition for ‘is’
msigdb_available: no visible binding for global variable ‘gs_cat’
msigdb_available: no visible binding for global variable ‘gs_subcat’
msigdb_download: no visible binding for global variable ‘gs_name’
msigdb_download: no visible binding for global variable ‘gene_symbol’
msigdb_download: no visible binding for global variable ‘.’
msigdb_version: no visible global function definition for
‘packageVersion’
Undefined global functions or variables:
. fdr from gene_symbol gs_cat gs_name gs_subcat is label
packageVersion pval significance size to write.table x y
Consider adding
importFrom("methods", "is")
importFrom("utils", "packageVersion", "write.table")
to your NAMESPACE file (and ensure that your DESCRIPTION Imports field
contains 'methods').
```
* checking for unstated dependencies in vignettes ... NOTE
```
'library' or 'require' call not declared from: ‘tidyverse’
```
Loading

0 comments on commit 8452596

Please sign in to comment.