Skip to content

Commit 9d7fdc7

Browse files
committed
update lecture 16
1 parent acce287 commit 9d7fdc7

File tree

1 file changed

+30
-36
lines changed

1 file changed

+30
-36
lines changed

slides/lecture16-boosting.ipynb

+30-36
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,22 @@
1919
"__Volodymyr Kuleshov__<br>Cornell Tech"
2020
]
2121
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {
25+
"slideshow": {
26+
"slide_type": "slide"
27+
}
28+
},
29+
"source": [
30+
"# Announcement\n",
31+
"\n",
32+
"* Project proposals have been graded over the weekend. \n",
33+
"* Milestone due on 11/7.\n",
34+
"* HW3 will be graded next week. Prelim will be graded after that.\n",
35+
"* We will hold review session for the prelim."
36+
]
37+
},
2238
{
2339
"cell_type": "markdown",
2440
"metadata": {
@@ -70,7 +86,7 @@
7086
" ensemble.append(model)\n",
7187
"\n",
7288
"# output average prediction at test time:\n",
73-
"y_test = ensemble.average_prediction(y_test)\n",
89+
"y_test = ensemble.average_prediction(x_test)\n",
7490
"```\n",
7591
"<!-- Data samples taken with replacement are known as bootstrap samples. -->"
7692
]
@@ -162,7 +178,7 @@
162178
"source": [
163179
"# Structure of a Boosting Algorithm\n",
164180
"\n",
165-
"Boosting reduces *underfitting* by combining models that correct each others' errors."
181+
"Boosting reduces *underfitting* via models that correct each others' errors."
166182
]
167183
},
168184
{
@@ -173,7 +189,7 @@
173189
}
174190
},
175191
"source": [
176-
"1. Fit a weak learner $g_0$ on dataset $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$. Let $f=g_0$."
192+
"1. Compute weights $w^{(i)}$ for each $i$ based on $t$-th model predictions $f_t(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors."
177193
]
178194
},
179195
{
@@ -184,7 +200,7 @@
184200
}
185201
},
186202
"source": [
187-
"2. Compute weights $w^{(i)}$ for each $i$ based on model predictions $f(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors."
203+
"2. Fit new weak learner $g_t$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$."
188204
]
189205
},
190206
{
@@ -195,18 +211,7 @@
195211
}
196212
},
197213
"source": [
198-
"3. Fit new weak learner $g_1$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$."
199-
]
200-
},
201-
{
202-
"cell_type": "markdown",
203-
"metadata": {
204-
"slideshow": {
205-
"slide_type": "fragment"
206-
}
207-
},
208-
"source": [
209-
"4. Set $f_1 = g_0 + \\alpha_1 g$ for some weight $\\alpha_1$. Go to Step 2 and repeat."
214+
"3. Set $f_{t+1} = f_t + \\alpha_t g_t$ for some weight $\\alpha_t$. Go to Step 1 and repeat."
210215
]
211216
},
212217
{
@@ -735,7 +740,7 @@
735740
}
736741
},
737742
"source": [
738-
"* $f(x)$ consists of $T$ smaller models $g$ with weights $\\alpha_t$ and params $\\phi_t$."
743+
"* $f(x)$ consists of $T$ smaller models $g$ with weights $\\alpha_t$ & params $\\phi_t$."
739744
]
740745
},
741746
{
@@ -781,18 +786,7 @@
781786
}
782787
},
783788
"source": [
784-
"1. Fit a weak learner $g_0$ on dataset $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$. Let $f=g_0$."
785-
]
786-
},
787-
{
788-
"cell_type": "markdown",
789-
"metadata": {
790-
"slideshow": {
791-
"slide_type": "fragment"
792-
}
793-
},
794-
"source": [
795-
"2. Compute weights $w^{(i)}$ for each $i$ based on model predictions $f(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors."
789+
"1. Compute weights $w^{(i)}$ for each $i$ based on $t$-th model predictions $f_t(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors."
796790
]
797791
},
798792
{
@@ -803,7 +797,7 @@
803797
}
804798
},
805799
"source": [
806-
"3. Fit new weak learner $g_1$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$."
800+
"2. Fit new weak learner $g_t$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$."
807801
]
808802
},
809803
{
@@ -814,7 +808,7 @@
814808
}
815809
},
816810
"source": [
817-
"4. Set $f_1 = g_0 + \\alpha_1 g$ for some weight $\\alpha_1$. Go to Step 2 and repeat."
811+
"3. Set $f_{t+1} = f_t + \\alpha_t g_t$ for some weight $\\alpha_t$. Go to Step 1 and repeat."
818812
]
819813
},
820814
{
@@ -887,8 +881,8 @@
887881
},
888882
"source": [
889883
"The resulting algorithm is often called L2Boost. At step $t$ we minimize\n",
890-
"$$\\sum_{i=1}^n (r^{(i)}_t - g(x^{(i)}; \\phi))^2, $$\n",
891-
"where $r^{(i)}_t = y^{(i)} - f(x^{(i)})_{t-1}$ is the residual from the model at time $t-1$."
884+
"$$\\sum_{i=1}^n (r^{(i)}_t - \\alpha g(x^{(i)}; \\phi))^2, $$\n",
885+
"where $r^{(i)}_t = y^{(i)} - f(x^{(i)})_{t-1}$ is the residual from the model $f_{t-1}$."
892886
]
893887
},
894888
{
@@ -1416,7 +1410,7 @@
14161410
},
14171411
"source": [
14181412
"At step $t$ we minimize\n",
1419-
"$$\\sum_{i=1}^n (r^{(i)}_t - g(x^{(i)}; \\phi))^2, $$\n",
1413+
"$$\\sum_{i=1}^n (r^{(i)}_t - \\alpha g(x^{(i)}; \\phi))^2, $$\n",
14201414
"where $r^{(i)}_t = y^{(i)} - f_{t-1}(x^{(i)})$ is the residual error of the model $f_{t-1}$."
14211415
]
14221416
},
@@ -1936,7 +1930,7 @@
19361930
"source": [
19371931
"We will then perform approximate functional gradient descent using\n",
19381932
"\n",
1939-
"$$f_t \\gets f_{t-1} - \\alpha_t \\nabla g_t$$\n",
1933+
"$$f_t \\gets f_{t-1} - \\alpha_t g_t$$\n",
19401934
"\n",
19411935
"which is approximately $f_t \\gets f_{t-1} - \\alpha_t \\nabla J(f_{t-1}).$"
19421936
]
@@ -2158,7 +2152,7 @@
21582152
},
21592153
"source": [
21602154
"At step $t$ we minimize\n",
2161-
"$$\\sum_{i=1}^n (r^{(i)}_t - g(x^{(i)}; \\phi))^2, $$\n",
2155+
"$$\\sum_{i=1}^n (r^{(i)}_t - \\alpha g(x^{(i)}; \\phi))^2, $$\n",
21622156
"where $r^{(i)}_t = y^{(i)} - f_{t-1}(x^{(i)})$ is the residual from $f_{t-1}$."
21632157
]
21642158
},

0 commit comments

Comments
 (0)