Skip to content

Commit

Permalink
Merge pull request #6 from iramler/main
Browse files Browse the repository at this point in the history
adding new data
  • Loading branch information
ryurko authored Jun 24, 2024
2 parents 28c7521 + db64e26 commit 6244323
Show file tree
Hide file tree
Showing 11 changed files with 6,339 additions and 780 deletions.
74 changes: 74 additions & 0 deletions _prep/olympic_swimming/init-olympic_swimming.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: "Prep File"
author: "Brendan Karadenes"
format: html
---

```{r}
library(tidyverse)
library(here)
# csv downloaded from Kaggle
# https://www.kaggle.com/datasets/datasciencedonut/olympic-swimming-1912-to-2020
swimming <- read.csv(here("Olympic_Swimming_Results_1912to2020.csv"))
```

# Cleaning Data Set

```{r}
swimming <- swimming %>%
rename(Relay = Relay.) %>%
rename(dist_m = Distance..in.meters.) %>%
mutate(Results = as.numeric(Results)) %>%
filter(!is.na(Results))
```

```{r}
swimming <- swimming %>%
rename(Relay = Relay.)
```

```{r}
swimming$dist_m <- gsub("m", "", swimming$dist_m)
```

```{r}
swimming <- swimming %>%
mutate(dist_m = as.numeric(dist_m))
```

```{r}
swimming <- swimming %>%
select(-Relay)
```

# Filtering for 100m races

```{r}
swimming <- swimming %>%
filter(dist_m == 100)
```

# Add time period variable

```{r}
swimming <- swimming %>%
mutate(time_period = case_when(
Year >= 1924 & Year < 1976 ~ "early",
Year >= 1976 & Year <= 2020 ~ "recent"
))
```

# Writing cleaned data as csv

```{r}
write.csv(swimming, "olympic_swimming.csv", row.names = FALSE)
```









65 changes: 32 additions & 33 deletions _prep/rowing_olympic_medals/init-rowing_olympic_medals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ library(readr)
Find the dataset on [this link](https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results?resource=download) and download it.

```{r}
athletes_df <- read_csv("medals.csv")
athletes_df <- read_csv("athlete_events.csv")
```

# Set up rowing_df
Expand All @@ -24,47 +24,46 @@ athletes_df <- read_csv("medals.csv")
rowing_df <- athletes_df |> filter(Sport == "Rowing")
```

# Set up medals

```{r}
country_event <- rowing_df |> group_by(Event, NOC) |> summarise()
```

```{r}
write_excel_csv(country_event, "country_event.csv")
```

```{r}
events <- rowing_df |> group_by(Event) |> summarise()
rowing_df <- rowing_df |> filter(!is.na(Medal))
```

```{r}
write_excel_csv(events, "events.csv")
```
# Set up so that one athlete for each team represents the team

```{r}
# `Number of Athletes` was edited manually in an excel file and then read back in a csv file
events_athletes <- read_csv("events_athletes.csv")
country_event <- rowing_df |>
group_by(Year, Event, Medal, NOC) |>
select(Year, Event, NOC, Medal) |>
slice(1) |>
arrange(Year, Event, Medal)
```

```{r}
events_noc_athletes <- left_join(country_event, events_athletes, by = "Event")
```

```{r}
rowingsum <- rowing_df |> mutate(n_medal= case_when(Medal == "Gold" ~ 1, Medal == "Silver" ~ 1, Medal == "Bronze" ~ 1, is.na(Medal) ~ 0)) |> mutate(n_point= case_when(Medal == "Gold" ~ 3, Medal == "Silver" ~ 2, Medal == "Bronze" ~ 1, is.na(Medal) ~ 0)) |> group_by(NOC) |> summarise(total_medals = sum(n_medal), total_points = sum(n_point))
```

```{r}
complete <- left_join(events_noc_athletes, rowingsum)
```

# Finalized medals

```{r}
medals <- complete |> mutate(total_medals = total_medals/`Number of Athletes`, total_points = total_points/`Number of Athletes`)
country_medals <- country_event |>
mutate(n_point= case_when(
Medal == "Gold" ~ 3,
Medal == "Silver" ~ 2,
Medal == "Bronze" ~ 1),
n_gold = case_when(
Medal == "Gold" ~ 1,
Medal == "Silver" ~ 0,
Medal == "Bronze" ~ 0),
n_silver = case_when(
Medal == "Gold" ~ 0,
Medal == "Silver" ~ 1,
Medal == "Bronze" ~ 0),
n_bronze = case_when(
Medal == "Gold" ~ 0,
Medal == "Silver" ~ 0,
Medal == "Bronze" ~ 1)) |>
group_by(NOC) |>
summarise(total_medals = n(),
total_points = sum(n_point),
total_gold = sum(n_gold),
total_silver = sum(n_silver),
total_bronze = sum(n_bronze))
```

```{r}
write_excel_csv(medals, "medals.csv")
write_csv(country_medals, "rowing_medals.csv")
```
63 changes: 63 additions & 0 deletions cricket/cricket_data_repo.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: "Exploring Logistic Regression Through Cricket"
author: "Vivian Johnson"
date: "June 10, 2024"

categories:
- Multiple Logistic Regression
---

## Motivation

Cricket is a game watched and played by billions of people across the world. Second in global popularity only to football (soccer), it is extremely popular in South Asia, Australia, Africa, and Europe.

### The Rules of Cricket

[Cricket](https://www.youtube.com/watch?v=yPXAzgwwo0A){target="_blank"} is played on a rectangular pitch inside an oval boundary. In the pitch, on either side, there is a vertical **wicket**, made up of three vertical wooden "stumps" with two wooden bails (small blocks of wood) resting atop the stumps. There are two teams, each with 11 players (a batting team and a bowling team).

**The bowling team:** The bowler bounces a small leather ball at the batter. The goal of the bowler is to get the batter out and ultimately knock the bails off of the stumps.

**The batting team:** The goal of the batting team is to score as many runs as possible and not let the bowler hit the bail off of the stumps. There are two batters at a time for the batting team. After the batter hits the ball, the two batters will attempt to switch sides. Each batter will go until they get out, when at that time, they will switch with another teammate.

**Scoring Runs:** Each time the two batters switch sides without getting out counts as a run scored. For example, if you hit the ball and are able to run to the other wicket and back without getting out, that would score two runs for your team. If the batter hits the ball on a bounce to the oval boundary, it counts as four runs scored. If the batter hits the ball over the oval boundary on a fly, it counts as six runs.

**Outs:** Below are a few common ways in which a batter can get out

- If the batter hits the ball and it is caught in the air (caught out)
- If a player on the bowling team throws the ball and it hits the bails off the stumps before the batter can cross the line while trying to score a run (run out)
- If the bowler throws the ball past the batter and it knocks a bail off the wicket
- If the batter gets hit in the legs by a ball that would have hit the wicket (Note: if the batter gets hit and the ball wasn't ruled as one that would have hit the wicket, the batter is not out)

The duration of cricket matches can range from hours to days, depending on what format is being played, and scores are often high as each individual batter usually scores many runs on multiple pitches before getting out.

## Data

The `asia_cup` data set includes data from each cricket match played in all Asia Cup Tournaments from 1984 (the first one) to 2022. The Asia Cup is a tournament that now takes place every two years, alternating host cities in different countries throughout Asia.

| Variable | Description |
|-----------------------|-------------------------------------------------|
| Team | One team of the match |
| Opponent | The team played |
| Host | Venue played at |
| Year | Tournament year played |
| Toss | Coin toss to determine who starts batting or bowling (0 = lost, 1 = won) |
| Selection | Team's selection after winning / losing the toss (0 = Batting, 1 = Bowling) |
| Run Scored | Total runs scored by that team |
| Fours | Total scored fours for that team |
| Sixes | Total scored sixes for that team |
| Extras | Amount of extra runs scored by that team |
| Highest Score | Highest individual number of runs scored |
| Result | 0 = lost match, 1 = won match |
| Given Extras | Number of extra runs given up |

Data Download: [cricket_asia_cup.csv](../data/cricket_asia_cup.csv)

## Questions

1. Building Simple Logistic Regression and Multiple Logistic Regression
2. Interpreting Logistic Regression
3. Finding odds and log odds of an event

## References

[Asia Cup 1984-2022](https://www.kaggle.com/datasets/hasibalmuzdadid/asia-cup-cricket-1984-to-2022/data){target="_blank"}
Loading

0 comments on commit 6244323

Please sign in to comment.