|
19 | 19 | "__Volodymyr Kuleshov__<br>Cornell Tech"
|
20 | 20 | ]
|
21 | 21 | },
|
| 22 | + { |
| 23 | + "cell_type": "markdown", |
| 24 | + "metadata": { |
| 25 | + "slideshow": { |
| 26 | + "slide_type": "slide" |
| 27 | + } |
| 28 | + }, |
| 29 | + "source": [ |
| 30 | + "# Announcement\n", |
| 31 | + "\n", |
| 32 | + "* Project proposals have been graded over the weekend. \n", |
| 33 | + "* Milestone due on 11/7.\n", |
| 34 | + "* HW3 will be graded next week. Prelim will be graded after that.\n", |
| 35 | + "* We will hold review session for the prelim." |
| 36 | + ] |
| 37 | + }, |
22 | 38 | {
|
23 | 39 | "cell_type": "markdown",
|
24 | 40 | "metadata": {
|
|
70 | 86 | " ensemble.append(model)\n",
|
71 | 87 | "\n",
|
72 | 88 | "# output average prediction at test time:\n",
|
73 |
| - "y_test = ensemble.average_prediction(y_test)\n", |
| 89 | + "y_test = ensemble.average_prediction(x_test)\n", |
74 | 90 | "```\n",
|
75 | 91 | "<!-- Data samples taken with replacement are known as bootstrap samples. -->"
|
76 | 92 | ]
|
|
162 | 178 | "source": [
|
163 | 179 | "# Structure of a Boosting Algorithm\n",
|
164 | 180 | "\n",
|
165 |
| - "Boosting reduces *underfitting* by combining models that correct each others' errors." |
| 181 | + "Boosting reduces *underfitting* via models that correct each others' errors." |
166 | 182 | ]
|
167 | 183 | },
|
168 | 184 | {
|
|
173 | 189 | }
|
174 | 190 | },
|
175 | 191 | "source": [
|
176 |
| - "1. Fit a weak learner $g_0$ on dataset $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$. Let $f=g_0$." |
| 192 | + "1. Compute weights $w^{(i)}$ for each $i$ based on $t$-th model predictions $f_t(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors." |
177 | 193 | ]
|
178 | 194 | },
|
179 | 195 | {
|
|
184 | 200 | }
|
185 | 201 | },
|
186 | 202 | "source": [
|
187 |
| - "2. Compute weights $w^{(i)}$ for each $i$ based on model predictions $f(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors." |
| 203 | + "2. Fit new weak learner $g_t$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$." |
188 | 204 | ]
|
189 | 205 | },
|
190 | 206 | {
|
|
195 | 211 | }
|
196 | 212 | },
|
197 | 213 | "source": [
|
198 |
| - "3. Fit new weak learner $g_1$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$." |
199 |
| - ] |
200 |
| - }, |
201 |
| - { |
202 |
| - "cell_type": "markdown", |
203 |
| - "metadata": { |
204 |
| - "slideshow": { |
205 |
| - "slide_type": "fragment" |
206 |
| - } |
207 |
| - }, |
208 |
| - "source": [ |
209 |
| - "4. Set $f_1 = g_0 + \\alpha_1 g$ for some weight $\\alpha_1$. Go to Step 2 and repeat." |
| 214 | + "3. Set $f_{t+1} = f_t + \\alpha_t g_t$ for some weight $\\alpha_t$. Go to Step 1 and repeat." |
210 | 215 | ]
|
211 | 216 | },
|
212 | 217 | {
|
|
735 | 740 | }
|
736 | 741 | },
|
737 | 742 | "source": [
|
738 |
| - "* $f(x)$ consists of $T$ smaller models $g$ with weights $\\alpha_t$ and params $\\phi_t$." |
| 743 | + "* $f(x)$ consists of $T$ smaller models $g$ with weights $\\alpha_t$ & params $\\phi_t$." |
739 | 744 | ]
|
740 | 745 | },
|
741 | 746 | {
|
|
781 | 786 | }
|
782 | 787 | },
|
783 | 788 | "source": [
|
784 |
| - "1. Fit a weak learner $g_0$ on dataset $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$. Let $f=g_0$." |
785 |
| - ] |
786 |
| - }, |
787 |
| - { |
788 |
| - "cell_type": "markdown", |
789 |
| - "metadata": { |
790 |
| - "slideshow": { |
791 |
| - "slide_type": "fragment" |
792 |
| - } |
793 |
| - }, |
794 |
| - "source": [ |
795 |
| - "2. Compute weights $w^{(i)}$ for each $i$ based on model predictions $f(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors." |
| 789 | + "1. Compute weights $w^{(i)}$ for each $i$ based on $t$-th model predictions $f_t(x^{(i)})$ and targets $y^{(i)}$. Give more weight to points with errors." |
796 | 790 | ]
|
797 | 791 | },
|
798 | 792 | {
|
|
803 | 797 | }
|
804 | 798 | },
|
805 | 799 | "source": [
|
806 |
| - "3. Fit new weak learner $g_1$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$." |
| 800 | + "2. Fit new weak learner $g_t$ on $\\mathcal{D} = \\{(x^{(i)}, y^{(i)})\\}$ with weights $w^{(i)}$." |
807 | 801 | ]
|
808 | 802 | },
|
809 | 803 | {
|
|
814 | 808 | }
|
815 | 809 | },
|
816 | 810 | "source": [
|
817 |
| - "4. Set $f_1 = g_0 + \\alpha_1 g$ for some weight $\\alpha_1$. Go to Step 2 and repeat." |
| 811 | + "3. Set $f_{t+1} = f_t + \\alpha_t g_t$ for some weight $\\alpha_t$. Go to Step 1 and repeat." |
818 | 812 | ]
|
819 | 813 | },
|
820 | 814 | {
|
|
887 | 881 | },
|
888 | 882 | "source": [
|
889 | 883 | "The resulting algorithm is often called L2Boost. At step $t$ we minimize\n",
|
890 |
| - "$$\\sum_{i=1}^n (r^{(i)}_t - g(x^{(i)}; \\phi))^2, $$\n", |
891 |
| - "where $r^{(i)}_t = y^{(i)} - f(x^{(i)})_{t-1}$ is the residual from the model at time $t-1$." |
| 884 | + "$$\\sum_{i=1}^n (r^{(i)}_t - \\alpha g(x^{(i)}; \\phi))^2, $$\n", |
| 885 | + "where $r^{(i)}_t = y^{(i)} - f(x^{(i)})_{t-1}$ is the residual from the model $f_{t-1}$." |
892 | 886 | ]
|
893 | 887 | },
|
894 | 888 | {
|
|
1416 | 1410 | },
|
1417 | 1411 | "source": [
|
1418 | 1412 | "At step $t$ we minimize\n",
|
1419 |
| - "$$\\sum_{i=1}^n (r^{(i)}_t - g(x^{(i)}; \\phi))^2, $$\n", |
| 1413 | + "$$\\sum_{i=1}^n (r^{(i)}_t - \\alpha g(x^{(i)}; \\phi))^2, $$\n", |
1420 | 1414 | "where $r^{(i)}_t = y^{(i)} - f_{t-1}(x^{(i)})$ is the residual error of the model $f_{t-1}$."
|
1421 | 1415 | ]
|
1422 | 1416 | },
|
|
1936 | 1930 | "source": [
|
1937 | 1931 | "We will then perform approximate functional gradient descent using\n",
|
1938 | 1932 | "\n",
|
1939 |
| - "$$f_t \\gets f_{t-1} - \\alpha_t \\nabla g_t$$\n", |
| 1933 | + "$$f_t \\gets f_{t-1} - \\alpha_t g_t$$\n", |
1940 | 1934 | "\n",
|
1941 | 1935 | "which is approximately $f_t \\gets f_{t-1} - \\alpha_t \\nabla J(f_{t-1}).$"
|
1942 | 1936 | ]
|
|
2158 | 2152 | },
|
2159 | 2153 | "source": [
|
2160 | 2154 | "At step $t$ we minimize\n",
|
2161 |
| - "$$\\sum_{i=1}^n (r^{(i)}_t - g(x^{(i)}; \\phi))^2, $$\n", |
| 2155 | + "$$\\sum_{i=1}^n (r^{(i)}_t - \\alpha g(x^{(i)}; \\phi))^2, $$\n", |
2162 | 2156 | "where $r^{(i)}_t = y^{(i)} - f_{t-1}(x^{(i)})$ is the residual from $f_{t-1}$."
|
2163 | 2157 | ]
|
2164 | 2158 | },
|
|
0 commit comments