Named Entity and Relation Extraction models for NFL play-by-play snippets
- Scrap Data
- Centralize Data
- combine multiple files into a single one
- Build Dataset / Model
- Split
- splits random subset for managable inspection - 1% at random
- ITERATE
- Annotate Data
- builds a redacted file for quick visual inspection
- Inspect Data
- if issues, fix and annotate again
- may require a complete reset of "gold standard" dataset
- Save
- add data to be used in model building - "gold standard"
- Build Model
- Annotate Data
- Split
scrap game ids and play-by-play text from ESPN for 2022 NFL regular season.
from the project root
cd tasks\scrap
make scrap-schedules
output files found in "tasks/data/1/"
make scrap-pbp
output files found in "tasks/data/2/"
create a main source file and split into dev / holdout datasets
from the project root
cd tasks\scrap
make centralize-data
output files found in "tasks/data/3/"
from the project root
cd workspace
extr-ds --split
output files found in "workspace/2/"
extr-ds --annotate
output files found in "workspace/3/"
extr-ds --relate
output files found in "workspace/3/"
extr-ds --save
output files found in "tasks/data/4/"
make crf