Skip to content

Commit ca19a0f

Browse files
committed
.
1 parent 49006ae commit ca19a0f

4 files changed

+168
-3
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Ensemble Method\n",
8+
"* Main cause of error while learning are due to noise, bias and variance \n",
9+
"* minimize above factors \n",
10+
"* group of weak learners combined to form a strong learner"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"metadata": {},
16+
"source": [
17+
"Models are different based on four criteria:\n",
18+
"1. Difference in population\n",
19+
"2. Difference in hypothesis\n",
20+
"3. Difference in modeling technique\n",
21+
"4. Difference in initial seed"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"**Bagging**\n",
29+
"Goal is to minimize variance. It creates several subsets of data chosen randomly with replacement. Each subset data is used to train. Average of all predictions are taken "
30+
]
31+
},
32+
{
33+
"cell_type": "markdown",
34+
"metadata": {},
35+
"source": [
36+
"**Boosting** \n",
37+
"Aims at fitting sequentially weak learners in a adaptive way. Each model learns from the mistake of previous model and minimize error. \n",
38+
"\n",
39+
"High bias model are computationally less expensive to fit. Once weak learners are chosen, there are two way they can be sequentially fitted. \n",
40+
"Two Important algo: Adaboost and Gradient Boosting \n",
41+
"These two algo differ on how they create and aggregate the weak learners during the sequential process. \n",
42+
"Adaboost: updates the weights attached to each of the training dataset observations \n",
43+
"Gradient Boosting: updates the value of these observations "
44+
]
45+
},
46+
{
47+
"attachments": {},
48+
"cell_type": "markdown",
49+
"metadata": {},
50+
"source": [
51+
"<img src=\"Image/ensemble.JPG\" width=\"800\" />"
52+
]
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": null,
57+
"metadata": {},
58+
"outputs": [],
59+
"source": []
60+
}
61+
],
62+
"metadata": {
63+
"kernelspec": {
64+
"display_name": "Python 3",
65+
"language": "python",
66+
"name": "python3"
67+
},
68+
"language_info": {
69+
"codemirror_mode": {
70+
"name": "ipython",
71+
"version": 3
72+
},
73+
"file_extension": ".py",
74+
"mimetype": "text/x-python",
75+
"name": "python",
76+
"nbconvert_exporter": "python",
77+
"pygments_lexer": "ipython3",
78+
"version": "3.6.6"
79+
}
80+
},
81+
"nbformat": 4,
82+
"nbformat_minor": 2
83+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Ensemble Method\n",
8+
"* Main cause of error while learning are due to noise, bias and variance \n",
9+
"* minimize above factors \n",
10+
"* group of weak learners combined to form a strong learner"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"metadata": {},
16+
"source": [
17+
"Models are different based on four criteria:\n",
18+
"1. Difference in population\n",
19+
"2. Difference in hypothesis\n",
20+
"3. Difference in modeling technique\n",
21+
"4. Difference in initial seed"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"**Bagging**\n",
29+
"Goal is to minimize variance. It creates several subsets of data chosen randomly with replacement. Each subset data is used to train. Average of all predictions are taken "
30+
]
31+
},
32+
{
33+
"cell_type": "markdown",
34+
"metadata": {},
35+
"source": [
36+
"**Boosting** \n",
37+
"Aims at fitting sequentially weak learners in a adaptive way. Each model learns from the mistake of previous model and minimize error. \n",
38+
"\n",
39+
"High bias model are computationally less expensive to fit. Once weak learners are chosen, there are two way they can be sequentially fitted. \n",
40+
"Two Important algo: Adaboost and Gradient Boosting \n",
41+
"These two algo differ on how they create and aggregate the weak learners during the sequential process. \n",
42+
"Adaboost: updates the weights attached to each of the training dataset observations \n",
43+
"Gradient Boosting: updates the value of these observations "
44+
]
45+
},
46+
{
47+
"attachments": {},
48+
"cell_type": "markdown",
49+
"metadata": {},
50+
"source": [
51+
"<img src=\"Image/ensemble.JPG\" width=\"800\" />"
52+
]
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": null,
57+
"metadata": {},
58+
"outputs": [],
59+
"source": []
60+
}
61+
],
62+
"metadata": {
63+
"kernelspec": {
64+
"display_name": "Python 3",
65+
"language": "python",
66+
"name": "python3"
67+
},
68+
"language_info": {
69+
"codemirror_mode": {
70+
"name": "ipython",
71+
"version": 3
72+
},
73+
"file_extension": ".py",
74+
"mimetype": "text/x-python",
75+
"name": "python",
76+
"nbconvert_exporter": "python",
77+
"pygments_lexer": "ipython3",
78+
"version": "3.6.6"
79+
}
80+
},
81+
"nbformat": 4,
82+
"nbformat_minor": 2
83+
}

Future Topics.txt

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
1-
Bagging n boosting
21
Sampling techniques
3-
Random Forest
42
Feature Engineering /Exploratory data analysis
53
Evaluation Metrics /Sampling Technique
64
Stat Analysis Basics
75
gradient boosting (decision tree)
8-
predictive modeling, time series analysis and segmentation techniques
6+
predictive modeling, time series analysis and segmentation techniques
7+
ensemble: stacking

Image/ensemble.JPG

26 KB
Loading

0 commit comments

Comments
 (0)