Skip to content

Commit

Permalink
Collects players info from fbref.com but misses the age column due to…
Browse files Browse the repository at this point in the history
… it being inside a <nobr> tag.

Added manual age calculation instead of scraping it.

Collects players info from fbref.com but misses the age column due to it being inside a <nobr> tag.

add test

add to vignette, bump to 6.7.0001

rename to fb player info

Added manual age calculation instead of scraping it.
  • Loading branch information
aymennasri authored and tonyelhabr committed Feb 10, 2025
1 parent 75a463f commit f6ce454
Show file tree
Hide file tree
Showing 7 changed files with 177 additions and 6 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: worldfootballR
Title: Extract and Clean World Football (Soccer) Data
Version: 0.6.7
Version: 0.6.7.0001
Authors@R: c(
person("Jason", "Zivkovic", , "[email protected]", role = c("aut", "cre", "cph")),
person("Tony", "ElHabr", , "[email protected]", role = "ctb"),
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ export(fb_match_shooting)
export(fb_match_summary)
export(fb_match_urls)
export(fb_player_goal_logs)
export(fb_player_info)
export(fb_player_match_logs)
export(fb_player_scouting_report)
export(fb_player_season_stats)
Expand Down
13 changes: 8 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
# worldfootballR 0.6.7
# worldfootballR (development version)

### Bugs

### Breaking Changes

### Improvements

* `tm_expiring_contracts()` returns additional `date_of_birth` column. (0.6.6.0001) [#355](https://github.com/JaseZiv/worldfootballR/issues/397)
* `tm_player_bio()` now includes three new columns. `picture_url` containing the URL of the player's picture from Transfermarkt, `squad_number` containing the current worn squad number of the player and `player_id` that that contains the player ID on Transfermarkt.
* `get_player_info()` added. ([#47](https://github.com/JaseZiv/worldfootballR/issues/47))
* `tm_get_suspensions()` and `tm_get_risk_of_suspensions()` added. ([#411](https://github.com/JaseZiv/worldfootballR/issues/411))
* `tm_get_suspensions()` and `tm_get_risk_of_suspensions()` added. (0.6.7.0000) ([#411](https://github.com/JaseZiv/worldfootballR/issues/411))
* `fb_player_info()` added. (0.6.7.0001) ([#47](https://github.com/JaseZiv/worldfootballR/issues/47))

***

# worldfootballR 0.6.7

* `tm_expiring_contracts()` returns additional `date_of_birth` column. (0.6.6.0001) [#355](https://github.com/JaseZiv/worldfootballR/issues/397)

***

Expand Down
116 changes: 116 additions & 0 deletions R/fb_player_info.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
#' Get Player Attributes
#'
#' Return a data frame of a player's info
#'
#' @param player_urls the URL(s) of the player(s)
#' @param time_pause the wait time (in seconds) between page loads
#'
#' @return A data frame
#' @export
#'
#' @export
#'
#' @examples
#' \dontrun{
#' try({
#' player_urls <- c(
#' "https://fbref.com/en/players/d70ce98e/", # Messi
#' "https://fbref.com/en/players/3515d404/" # JWP
#' )
#' fb_player_info(player_urls)
#' })
#' }
fb_player_info <- function(player_urls, time_pause = 3) {

get_each_player_info <- function(url) {
pb$tick()

Sys.sleep(time_pause)

page <- xml2::read_html(url)

full_name <- page %>% rvest::html_node("h1") %>% rvest::html_text2() %>% stringr::str_trim()

position_node <- page %>% rvest::html_node(xpath = "//p[contains(., 'Position:')]")
position_footed <- position_node %>% rvest::html_text2() %>% stringr::str_trim()
position <- stringr::str_split(position_footed, "", 2)[[1]][1] %>%
stringr::str_remove("Position:") %>% stringr::str_trim()
footed <- stringr::str_split(position_footed, "", 2)[[1]][2] %>%
stringr::str_remove("Footed:") %>% stringr::str_trim()

height_weight <- page %>% rvest::html_node(xpath = "//p[contains(., 'cm')]") %>% rvest::html_text2()
height <- stringr::str_extract(height_weight, "\\d+cm")
weight <- stringr::str_extract(height_weight, "\\d+kg")

birth_date <- page %>% rvest::html_node("#necro-birth") %>% rvest::html_attr("data-birth")

# Calculates age from birth_date
age <- NA
birth_date_clean <- .replace_empty_na(birth_date)

if (!is.na(birth_date_clean)) {
birth_date_date <- as.Date(birth_date_clean)
today <- Sys.Date()

# Calculates years
years <- as.integer(format(today, "%Y")) - as.integer(format(birth_date_date, "%Y"))
current_year_birthday <- as.Date(paste0(format(today, "%Y"), "-", format(birth_date_date, "%m-%d")))

# Handles invalid dates (e.g., February 29 in non-leap years)
if (is.na(current_year_birthday)) {
current_year_birthday <- as.Date(paste0(format(today, "%Y"), "-03-01"))
}

# Adjusts years and find last valid birthday
if (current_year_birthday > today) {
years <- years - 1
last_birthday <- as.Date(paste0(as.integer(format(today, "%Y")) - 1, "-", format(birth_date_date, "%m-%d")))
# Handles invalid adjusted dates
if (is.na(last_birthday)) {
last_birthday <- as.Date(paste0(as.integer(format(today, "%Y")) - 1, "-03-01"))
}
} else {
last_birthday <- current_year_birthday
}

# Calculates days since last birthday
days <- as.integer(difftime(today, last_birthday, units = "days"))
age <- paste0(years, "-", days, "d")
}

birth_place <- page %>% rvest::html_node(xpath = "//p[contains(., 'Born:')]//span[contains(., 'in ')]") %>%
rvest::html_text2() %>% stringr::str_remove("^in ") %>% stringr::str_trim()

national_team <- page %>% rvest::html_node(xpath = "//p[contains(., 'National Team:')]/a") %>% rvest::html_text2()
club <- page %>% rvest::html_node(xpath = "//p[contains(., 'Club:')]/a") %>% rvest::html_text2()

wages <- page %>% rvest::html_node(".important.poptip") %>% rvest::html_text2() %>% stringr::str_trim()

twitter <- page %>% rvest::html_node(xpath = "//p[contains(., 'Twitter:')]/a") %>% rvest::html_text2()
instagram <- page %>% rvest::html_node(xpath = "//p[contains(., 'Instagram:')]/a") %>% rvest::html_text2()

data.frame(
full_name = .replace_empty_na(full_name),
position = .replace_empty_na(position),
footed = .replace_empty_na(footed),
height = .replace_empty_na(height),
weight = .replace_empty_na(weight),
birth_date = .replace_empty_na(birth_date),
age = .replace_empty_na(age),
birth_place = .replace_empty_na(birth_place),
national_team = .replace_empty_na(national_team),
club = .replace_empty_na(club),
wages = .replace_empty_na(wages),
twitter = .replace_empty_na(twitter),
instagram = .replace_empty_na(instagram),
stringsAsFactors = FALSE
)
}

# create the progress bar with a progress function.
pb <- progress::progress_bar$new(total = length(player_urls))

player_urls %>%
purrr::map_df(get_each_player_info)

}
30 changes: 30 additions & 0 deletions man/fb_player_info.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions tests/testthat/test-fbref.R
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,14 @@ test_that("fb_team_goal_logs() works", {
expect_false(nrow(granada_goal_log) == 0)
})

Sys.sleep(3)
test_that("fb_player_info() works", {
testthat::skip_on_cran()
jwp <- fb_player_info(player_url = "https://fbref.com/en/players/3515d404/James-Ward-Prowse")
expect_type(jwp, "list")
expect_true(nrow(jwp) == 1)
})

Sys.sleep(3)
test_that("fb_player_goal_logs() works", {
testthat::skip_on_cran()
Expand Down
13 changes: 13 additions & 0 deletions vignettes/extract-fbref-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,19 @@ This section will cover the functions available to aid in the extraction of play

The examples provided below in a lot of cases have the actual url (player or team) passed to them, however the suite of fbref helper functions outlined in [this helpers vignette](https://jaseziv.github.io/worldfootballR/articles/extract-helper-urls.html) could also be used.

### Get Player Info

Use the `fb_player_info()` function.

```{r fb_player_info, eval=FALSE}
player_urls <- c(
"https://fbref.com/en/players/d70ce98e/", # Messi
"https://fbref.com/en/players/3515d404/" # JWP
)
messi_and_jwp_info <- fb_player_info(player_urls)
dplyr::glimpse(messi_and_jwp_info)
```


### Get Player Scouting Report

Expand Down

0 comments on commit f6ce454

Please sign in to comment.