This repository includes a variety of datasets to be used for educational purposes.
1- State_Drug_Utilization_Data_2010
- In this folder you will find a dataset with the same name as its folder taken from data.gov catalog. The data are significantly reduced from its original size. Included in the folder is a Jupyter Notebook file called "read_&_sample_data.ipynb" which shows how the data were sampled. Feel free to visit the data.gov and direct download the entire CSV version of the sampled dataset.
2- baseballdatabank-master
- These data are collected from Sean Lahman including the latest csv format version. There are two folders inside the 'baseballdatabank-master' folder, the 'core' folder and the 'upstream' folder. In the 'core' folder are located most of the database tables formated as csv files, plus a 'readme' text file explaining the tables. The 'upstream' folder contains the 'Teams.csv' table. You can download the latest version of the data into your own local machine directly from Sean's site.
3- Stock data
- Dataset is provided by the author of the "Easy And Fun With BeautifulSoup". There are two identical datasets of stock data, one as a csv format and the other as an excel type. Data are extracted from the Yahoo Finance website and include 90 days of data from November 21, 2019 until April 28, 2020. Data are row, current and good for learning purposes.
4- Yelp datasets
- Yelp datasets are included below as a link. The tables are formated as json and the data are very reach and interesting to work with. Just be aware that the data are large and you will need to download directly from the source, links below.