-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from iramler/main
adding new data
- Loading branch information
Showing
11 changed files
with
6,339 additions
and
780 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
title: "Prep File" | ||
author: "Brendan Karadenes" | ||
format: html | ||
--- | ||
|
||
```{r} | ||
library(tidyverse) | ||
library(here) | ||
# csv downloaded from Kaggle | ||
# https://www.kaggle.com/datasets/datasciencedonut/olympic-swimming-1912-to-2020 | ||
swimming <- read.csv(here("Olympic_Swimming_Results_1912to2020.csv")) | ||
``` | ||
|
||
# Cleaning Data Set | ||
|
||
```{r} | ||
swimming <- swimming %>% | ||
rename(Relay = Relay.) %>% | ||
rename(dist_m = Distance..in.meters.) %>% | ||
mutate(Results = as.numeric(Results)) %>% | ||
filter(!is.na(Results)) | ||
``` | ||
|
||
```{r} | ||
swimming <- swimming %>% | ||
rename(Relay = Relay.) | ||
``` | ||
|
||
```{r} | ||
swimming$dist_m <- gsub("m", "", swimming$dist_m) | ||
``` | ||
|
||
```{r} | ||
swimming <- swimming %>% | ||
mutate(dist_m = as.numeric(dist_m)) | ||
``` | ||
|
||
```{r} | ||
swimming <- swimming %>% | ||
select(-Relay) | ||
``` | ||
|
||
# Filtering for 100m races | ||
|
||
```{r} | ||
swimming <- swimming %>% | ||
filter(dist_m == 100) | ||
``` | ||
|
||
# Add time period variable | ||
|
||
```{r} | ||
swimming <- swimming %>% | ||
mutate(time_period = case_when( | ||
Year >= 1924 & Year < 1976 ~ "early", | ||
Year >= 1976 & Year <= 2020 ~ "recent" | ||
)) | ||
``` | ||
|
||
# Writing cleaned data as csv | ||
|
||
```{r} | ||
write.csv(swimming, "olympic_swimming.csv", row.names = FALSE) | ||
``` | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
--- | ||
title: "Exploring Logistic Regression Through Cricket" | ||
author: "Vivian Johnson" | ||
date: "June 10, 2024" | ||
|
||
categories: | ||
- Multiple Logistic Regression | ||
--- | ||
|
||
## Motivation | ||
|
||
Cricket is a game watched and played by billions of people across the world. Second in global popularity only to football (soccer), it is extremely popular in South Asia, Australia, Africa, and Europe. | ||
|
||
### The Rules of Cricket | ||
|
||
[Cricket](https://www.youtube.com/watch?v=yPXAzgwwo0A){target="_blank"} is played on a rectangular pitch inside an oval boundary. In the pitch, on either side, there is a vertical **wicket**, made up of three vertical wooden "stumps" with two wooden bails (small blocks of wood) resting atop the stumps. There are two teams, each with 11 players (a batting team and a bowling team). | ||
|
||
**The bowling team:** The bowler bounces a small leather ball at the batter. The goal of the bowler is to get the batter out and ultimately knock the bails off of the stumps. | ||
|
||
**The batting team:** The goal of the batting team is to score as many runs as possible and not let the bowler hit the bail off of the stumps. There are two batters at a time for the batting team. After the batter hits the ball, the two batters will attempt to switch sides. Each batter will go until they get out, when at that time, they will switch with another teammate. | ||
|
||
**Scoring Runs:** Each time the two batters switch sides without getting out counts as a run scored. For example, if you hit the ball and are able to run to the other wicket and back without getting out, that would score two runs for your team. If the batter hits the ball on a bounce to the oval boundary, it counts as four runs scored. If the batter hits the ball over the oval boundary on a fly, it counts as six runs. | ||
|
||
**Outs:** Below are a few common ways in which a batter can get out | ||
|
||
- If the batter hits the ball and it is caught in the air (caught out) | ||
- If a player on the bowling team throws the ball and it hits the bails off the stumps before the batter can cross the line while trying to score a run (run out) | ||
- If the bowler throws the ball past the batter and it knocks a bail off the wicket | ||
- If the batter gets hit in the legs by a ball that would have hit the wicket (Note: if the batter gets hit and the ball wasn't ruled as one that would have hit the wicket, the batter is not out) | ||
|
||
The duration of cricket matches can range from hours to days, depending on what format is being played, and scores are often high as each individual batter usually scores many runs on multiple pitches before getting out. | ||
|
||
## Data | ||
|
||
The `asia_cup` data set includes data from each cricket match played in all Asia Cup Tournaments from 1984 (the first one) to 2022. The Asia Cup is a tournament that now takes place every two years, alternating host cities in different countries throughout Asia. | ||
|
||
| Variable | Description | | ||
|-----------------------|-------------------------------------------------| | ||
| Team | One team of the match | | ||
| Opponent | The team played | | ||
| Host | Venue played at | | ||
| Year | Tournament year played | | ||
| Toss | Coin toss to determine who starts batting or bowling (0 = lost, 1 = won) | | ||
| Selection | Team's selection after winning / losing the toss (0 = Batting, 1 = Bowling) | | ||
| Run Scored | Total runs scored by that team | | ||
| Fours | Total scored fours for that team | | ||
| Sixes | Total scored sixes for that team | | ||
| Extras | Amount of extra runs scored by that team | | ||
| Highest Score | Highest individual number of runs scored | | ||
| Result | 0 = lost match, 1 = won match | | ||
| Given Extras | Number of extra runs given up | | ||
|
||
Data Download: [cricket_asia_cup.csv](../data/cricket_asia_cup.csv) | ||
|
||
## Questions | ||
|
||
1. Building Simple Logistic Regression and Multiple Logistic Regression | ||
2. Interpreting Logistic Regression | ||
3. Finding odds and log odds of an event | ||
|
||
## References | ||
|
||
[Asia Cup 1984-2022](https://www.kaggle.com/datasets/hasibalmuzdadid/asia-cup-cricket-1984-to-2022/data){target="_blank"} |
Oops, something went wrong.