Merge pull request #15 from jmgiron98/patch-1

RuiShu · web-flow · commit edc1d42d1470 · 2019-11-17T18:23:19.000-08:00
Update index.md
diff --git a/autoregressive/index.md b/autoregressive/index.md
@@ -32,7 +32,7 @@ The term *autoregressive* originates from the literature on time-series models w
 
 If we allow for every conditional $$p(x_i \vert \mathbf{x}_{< i})$$ to be specified in a tabular form, then such a representation is fully general and can represent any possible distribution over $$n$$ random variables. However, the space complexity for such a representation grows exponentially with $$n$$.
 
-To see why, let us consider the conditional for the last dimension, given by $$p(x_n \vert \mathbf{x}_{< n})$$. In order to fully specify this conditional, we need to specify a probability for $$2^{n-1}$$ configurations of the variables $$x_1, x_2, \ldots, x_{n-1}$$. Since the probabilities should sum to 1, the total number of parameters for specifying this conditional is given by $$2^{n-1} -1$$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.
+To see why, let us consider the conditional for the last dimension, given by $$p(x_n \vert \mathbf{x}_{< n})$$. In order to fully specify this conditional, we need to specify a probability distribution for each of the $$2^{n-1}$$ configurations of the variables $$x_1, x_2, \ldots, x_{n-1}$$. For any one of the $$2^{n-1}$$ possible configurations of the variables, the probabilities should sum to one. Therefore, we need only one parameter for each configuration, so the total number of parameters for specifying this conditional is given by $$2^{n-1}$$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.
 
 In an *autoregressive generative model*, the conditionals are specified as parameterized functions with a fixed number of parameters. That is, we assume the conditional distributions $$p(x_i \vert \mathbf{x}_{< i})$$ to correspond to a Bernoulli random variable and learn a function that maps the preceding random variables $$x_1, x_2, \ldots, x_{i-1}$$ to the
 mean of this distribution. Hence, we have