Skip to content

Latest commit

 

History

History
487 lines (460 loc) · 26.9 KB

File metadata and controls

487 lines (460 loc) · 26.9 KB

Machine Learning Specialization

Course can be found in Coursera

Partial notes can be found in my blog SSQ

1. Machine Learning Foundations: A Case Study Approach

Course can be found in Coursera

Programming Assignments

Slides and more details about this course can be found in my Github SSQ

  • Week 1 Introduction

    • Regression. Case study: Predicting house prices
    • Classification. Case study: Analyzing sentiment
    • Clustering & Retrieval. Case study: Finding documents
    • Matrix Factorization & Dimensionality Reduction. Case study: Recommending Products
    • Capstone. An intelligent application using deep learning
    • Familiar with Ipython notebook and Sframe
  • Week 2 Regression Predicting House Prices

  • Week 3 Classification Analyzing Sentiment

2. Machine Learning: Regression

Course can be found in Coursera

Description Programming Assignments
Models
  • Linear regression
  • Regularization: Ridge (L2), Lasso (L1)
  • Nearest neighbor and kernel regression
Algorithms
  • Gradient descent
  • Coordinate descent
Concepts
  • Loss functions, bias-variance tradeoff
  • cross-validation, sparsity, overfitting
  • model selection, feature selection

Slides and more details about this course can be found in my Github SSQ

  • Week 1: Simple Linear Regression:

    • Describe the input (features) and output (real-valued predictions) of a regression model
    • Calculate a goodness-of-fit metric (e.g., RSS)
    • Estimate model parameters to minimize RSS using gradient descent
    • Interpret estimated model parameters
    • Exploit the estimated model to form predictions
    • Discuss the possible influence of high leverage points
    • Describe intuitively how fitted line might change when assuming different goodness-of-fit metrics
    • Fitting a simple linear regression model on housing data
  • Week 2: Multiple Regression: Linear regression with multiple features

    • Describe polynomial regression
    • Detrend a time series using trend and seasonal components
    • Write a regression model using multiple inputs or features thereof
    • Cast both polynomial regression and regression with multiple inputs as regression with multiple features
    • Calculate a goodness-of-fit metric (e.g., RSS)
    • Estimate model parameters of a general multiple regression model to minimize RSS:
      • In closed form
      • Using an iterative gradient descent algorithm
    • Interpret the coefficients of a non-featurized multiple regression fit
    • Exploit the estimated model to form predictions
    • Explain applications of multiple regression beyond house price modeling
    • Exploring different multiple regression models for house price prediction
    • Implementing gradient descent for multiple regression
  • Week 3: Assessing Performance

    • Describe what a loss function is and give examples
    • Contrast training, generalization, and test error
    • Compute training and test error given a loss function
    • Discuss issue of assessing performance on training set
    • Describe tradeoffs in forming training/test splits
    • List and interpret the 3 sources of avg. prediction error
      • Irreducible error, bias, and variance
    • Discuss issue of selecting model complexity on test data and then using test error to assess generalization error
    • Motivate use of a validation set for selecting tuning parameters (e.g., model complexity)
    • Describe overall regression workflow
    • Exploring the bias-variance tradeoff
  • Week 4: Ridge Regression

    • Describe what happens to magnitude of estimated coefficients when model is overfit
    • Motivate form of ridge regression cost function
    • Describe what happens to estimated coefficients of ridge regression as tuning parameter λ is varied
    • Interpret coefficient path plot
    • Estimate ridge regression parameters:
      • In closed form
      • Using an iterative gradient descent algorithm
    • Implement K-fold cross validation to select the ridge regression tuning parameter λ
    • Observing effects of L2 penalty in polynomial regression
    • Implementing ridge regression via gradient descent
  • Week 5: Lasso Regression: Regularization for feature selection

    • Perform feature selection using “all subsets” and “forward stepwise” algorithms
    • Analyze computational costs of these algorithms
    • Contrast greedy and optimal algorithms
    • Formulate lasso objective
    • Describe what happens to estimated lasso coefficients as tuning parameter λ is varied
    • Interpret lasso coefficient path plot
    • Contrast ridge and lasso regression
    • Describe geometrically why L1 penalty leads to sparsity
    • Estimate lasso regression parameters using an iterative coordinate descent algorithm
    • Implement K-fold cross validation to select lasso tuning parameter λ
    • Using LASSO to select features
    • Implementing LASSO using coordinate descent
  • Week 6: Going nonparametric: Nearest neighbor and kernel regression

    • Motivate the use of nearest neighbor (NN) regression
    • Define distance metrics in 1D and multiple dimensions
    • Perform NN and k-NN regression
    • Analyze computational costs of these algorithms
    • Discuss sensitivity of NN to lack of data, dimensionality, and noise
    • Perform weighted k-NN and define weights using a kernel
    • Define and implement kernel regression
    • Describe the effect of varying the kernel bandwidth λ or # of nearest neighbors k
    • Select λ or k using cross validation
    • Compare and contrast kernel regression with a global average fit
    • Define what makes an approach nonparametric and why NN and kernel regression are considered nonparametric methods
    • Analyze the limiting behavior of NN regression
    • Use NN for classification
    • Predicting house prices using k-nearest neighbors regression

3. Machine Learning: Classification

Course can be found in Coursera

Description Programming Assignments
Models
  • Linear classifiers
  • Logistic regression
  • Decision trees
  • Ensembles
Algorithms
  • Stochastic gradient descent
  • Recursive greedy
  • Boosting
Concepts
  • Decision boundaries, MLE
  • ensemble methods, online learning
Core ML
  • Alleviating overfitting
  • Handling missing data
  • Precision-recall
  • Online learning

Slides and more details about this course can be found in my Github

  • Week 1:
    • Linear Classifiers & Logistic Regression
      • decision boundaries
      • linear classifiers
      • class probability
      • logistic regression
      • impact of coefficient values on logistic regression output
      • 1-hot encoding
      • multiclass classification using the 1-versus-all
      • Predicting sentiment from product reviews
  • Week 2:
    • Learning Linear Classifiers
      • Maximum likelihood estimation
      • Gradient ascent algorithm for learning logistic regression classifier
      • Choosing step size for gradient ascent/descent
      • (VERY OPTIONAL LESSON) Deriving gradient of logistic regression
      • Implementing logistic regression from scratch
    • Overfitting & Regularization in Logistic Regression
  • Week 3:
    • Decision Trees
      • Predicting loan defaults with decision trees
      • Learning decision trees
        • Recursive greedy algorithm
        • Learning a decision stump
        • Selecting best feature to split on
        • When to stop recursing
      • Using the learned decision tree
        • Traverse a decision tree to make predictions: Majority class predictions; Probability predictions; Multiclass classification
      • Learning decision trees with continuous inputs
        • Threshold splits for continuous inputs
        • (OPTIONAL) Picking the best threshold to split on
      • Identifying safe loans with decision trees
      • Implementing binary decision trees from scratch
  • Week 4
    • Overfitting in decision trees
      • Identify when overfitting in decision trees
      • Prevent overfitting with early stopping
        • Limit tree depth
        • Do not consider splits that do not reduce classification error
        • Do not split intermediate nodes with only few points
      • Prevent overfitting by pruning complex trees
        • Use a total cost formula that balances classification error and tree complexity
        • Use total cost to merge potentially complex trees into simpler ones
      • Decision Trees in Practice for preventing overfitting
    • Handling missing data
      • Describe common ways to handling missing data:
        1. Skip all rows with any missing values
        2. Skip features with many missing values
        3. Impute missing values using other data points
      • Modify learning algorithm (decision trees) to handle missing data:
        1. Missing values get added to one branch of split
        2. Use classification error to determine where missing values go
  • Week 5
    • Boosting
      • Identify notion ensemble classifiers
      • Formalize ensembles as the weighted combination of simpler classifiers
      • Outline the boosting framework – sequentially learn classifiers on weighted data
      • Describe the AdaBoost algorithm
        • Learn each classifier on weighted data
        • Compute coefficient of classifier
        • Recompute data weights
        • Normalize weights
      • Implement AdaBoost to create an ensemble of decision stumps
      • Discuss convergence properties of AdaBoost & how to pick the maximum number of iterations T
      • Exploring Ensemble Methods with pre-implemented gradient boosted trees
      • Implement your own boosting module
  • Week 6
    • Evaluating classifiers: Precision & Recall
      • Classification accuracy/error are not always right metrics
      • Precision captures fraction of positive predictions that are correct
      • Recall captures fraction of positive data correctly identified by the model
      • Trade-off precision & recall by setting probability thresholds
      • Plot precision-recall curves.
      • Compare models by computing precision at k
      • Exploring precision and recall
  • Week 7
    • Scaling to Huge Datasets & Online Learning
      • Significantly speedup learning algorithm using stochastic gradient
      • Describe intuition behind why stochastic gradient works
      • Apply stochastic gradient in practice
      • Describe online learning problems
      • Relate stochastic gradient to online learning
      • Training Logistic Regression via Stochastic Gradient Ascent

4. Machine Learning: Clustering & Retrieval

Course can be found in Coursera

Description Programming Assignments
Models
  • Nearest neighbors
  • Clustering, mixtures of Gaussians
  • Latent Dirichlet allocation (LDA)
Algorithms
  • K-means, MapReduce
  • K-NN, KD-trees, locality-sensitive hashing (LSH)
  • Expectation-maximization (EM)
  • Gibbs sampling
Concepts
  • Distance metrics, approximation algorithms,
  • hashing, sampling algorithms, scaling up with map-reduce
Core ML
  • Unsupervised learning
  • Probabilistic modeling
  • Data parallel problems
  • Bayesian inference

Slides and more details about this course can be found in my Github SSQ

  • Week 1 Intro

  • Week 2 Nearest Neighbor Search: Retrieving Documents

    • Implement nearest neighbor search for retrieval tasks
    • Contrast document representations (e.g., raw word counts, tf-idf,…)
      • Emphasize important words using tf-idf
    • Contrast methods for measuring similarity between two documents
      • Euclidean vs. weighted Euclidean
      • Cosine similarity vs. similarity via unnormalized inner product
    • Describe complexity of brute force search
    • Implement KD-trees for nearest neighbor search
    • Implement LSH for approximate nearest neighbor search
    • Compare pros and cons of KD-trees and LSH, and decide which is more appropriate for given dataset
    • Choosing features and metrics for nearest neighbor search
    • Implementing Locality Sensitive Hashing from scratch
  • Week 3 Clustering with k-means

    • Describe potential applications of clustering
    • Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
    • Determine whether a task is supervised or unsupervised
    • Cluster documents using k-means
    • Interpret k-means as a coordinate descent algorithm
    • Define data parallel problems
    • Explain Map and Reduce steps of MapReduce framework
    • Use existing MapReduce implementations to parallelize kmeans, understanding what’s being done under the hood
    • Clustering text data with k-means
  • Week 4 Mixture Models: Model-Based Clustering

    • Interpret a probabilistic model-based approach to clustering using mixture models
    • Describe model parameters
    • Motivate the utility of soft assignments and describe what they represent
    • Discuss issues related to how the number of parameters grow with the number of dimensions
      • Interpret diagonal covariance versions of mixtures of Gaussians
    • Compare and contrast mixtures of Gaussians and k-means
    • Implement an EM algorithm for inferring soft assignments and cluster parameters
      • Determine an initialization strategy
      • Implement a variant that helps avoid overfitting issues
    • Implementing EM for Gaussian mixtures
    • Clustering text data with Gaussian mixtures
  • Week 5 Latent Dirichlet Allocation: Mixed Membership Modeling

    • Compare and contrast clustering and mixed membership models
    • Describe a document clustering model for the bagof-words doc representation
    • Interpret the components of the LDA mixed membership model
    • Analyze a learned LDA model
      • Topics in the corpus
      • Topics per document
    • Describe Gibbs sampling steps at a high level
    • Utilize Gibbs sampling output to form predictions or estimate model parameters
    • Implement collapsed Gibbs sampling for LDA
    • Modeling text topics with Latent Dirichlet Allocation
  • Week 6 Hierarchical Clustering & Closing Remarks

    • Bonus content: Hierarchical clustering
      • Divisive clustering
      • Agglomerative clustering
        • The dendrogram for agglomerative clustering
        • Agglomerative clustering details
    • Hidden Markov models (HMMs): Another notion of “clustering”
    • Modeling text data with a hierarchy of clusters