<- Chapter 1 – Getting things ready: Sign-up/sign-in to Azure and AzureML Studio
The first thing that we'll need for our experiment is the data that we will use for both train/test our model and to generate predictions with our trained model. Using Azure ML Studio tool we can import our data in multiple ways (directly from SQL databases, DocumentDB, Azure Storage...and other storage options, using the "Import Data" module on the tool), but let's keep thing simple and upload three csv files to our Azure ML account:
- leagues_NBA_totals_master: file containing the total stats (points, assists, rebounds, steals, games played, minutes, etc.) for every NBA player since 1981 to 2016, which will be used to train and evaluate our model -> C:/Lab/leagues_NBA_totals_master.csv or download it here
- leagues_NBA_advanced_master: file containing the advanced stats (PER, true shooting %, offensive win share, defensive win share, total win share, etc.) for every NBA player since 1981 to 2016, which will be used to train and evaluate our model -> C:/Lab/leagues_NBA_advanced_master.csv or download it here
- nba_2017_players_input: file containing the info for every player who played on the 2016-2017 NBA season (rookies included), which will be used to create the predictions -> C:/Lab/nba_2017_players_input.csv or download it here
Note that every csv file share 5 common field (Player, Pos, Age, Tm and Season), which will be the features we'll use for our experiment.
Once downloaded, let's upload our three files on out Azure ML workspace:
-
We start by switching the view to the "Datasets" tab, and clicking on the "New" button in the bottom left corner:
-
On the "New" blade, click on "From local file" and select the file to upload (repeat this step for each file):
-
Once uploaded, we can find our three files on the "Datasets" list:
As you can see, there is an empty "Project" column. Projects allow us to organize our datasets, experiments and other components by putting them together on the same place. Let's create a new project to add our recently uploaded datasets and the experiments we are about to create.
-
Select the 3 csv files and click on the "Add to project" button on the bottom bar:
-
Click on "New project" and enter the desired name for your project:
-
You can check that the project with our 3 datasets has been created by switching to the "Projects" view:
Now, let's create our first experiment, in which we will create the data flow to clean and prepare our data and also we'll train our model to generate the predictions for 2016-2017 season:
-
Switch to the "Experiments" view, and click on the "New" button on the bottom-left corner:
-
Once created, change the name of the experiment to one less generic:
As we cannot save an experiment without any modules, let's start it by adding our training datasets and joining them using the fields they have in common:
-
On the left column of the tool, we have the list of the different modules that we can use on our experiments. Add the totals and advanced datasets by browsing "Saved Datasets > My Datasets" and drag-and drop both files:
-
On the same left column, search for the "Join Data" module and drag it to the canvas. Once you've done it, you'll need to connect both datasets with the "Join Data" module by dragging from the bottom dot of the dataset to the top on the "Join Data" module:
-
With the "Join Data" module selected, let's choose the common columns to joint both datasets. Click on the "Launch column selector" for the first dataset, select the common columns (Player, Post, Age, Tm, Season), click on the right arrow on the middle of both lists and click on the ok button on the bottom-right corner to confirm, then repeat it for the second dataset:
-
To complete this step, select the "Inner Join" type on the dropdown, and un-check the "Keep right key columns in joined table" to avoid duplicate columns on our joined dataset:
-
You can run the experiment and take a look at the resulting dataset by right clicking (when completed, a green check sign will appear on the Join Data module) on the "Join Data" bottom dot and "Visualize":
Finally, let's also add our experiment to the project we created before, so we have all of our assets together:
-
Save the experiment (bottom bar button), switch to the "Projects" view, click to the one you created before and click on the "Edit" button on the bottom bar:
-
On the edit view, open the "Experiments" dropdown, mark the experiment you just created and click on the right arrow between both lists:
Now you are ready to go! You have your datasets, the training experiment with the first step on the dataflow and the project containing all of your assets, which should look like this: