Skip to content

Commit edc1d42

Browse files
authored
Merge pull request #15 from jmgiron98/patch-1
Update index.md
2 parents f3b6f95 + 4f2ee59 commit edc1d42

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

autoregressive/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ The term *autoregressive* originates from the literature on time-series models w
3232

3333
If we allow for every conditional $$p(x_i \vert \mathbf{x}_{< i})$$ to be specified in a tabular form, then such a representation is fully general and can represent any possible distribution over $$n$$ random variables. However, the space complexity for such a representation grows exponentially with $$n$$.
3434

35-
To see why, let us consider the conditional for the last dimension, given by $$p(x_n \vert \mathbf{x}_{< n})$$. In order to fully specify this conditional, we need to specify a probability for $$2^{n-1}$$ configurations of the variables $$x_1, x_2, \ldots, x_{n-1}$$. Since the probabilities should sum to 1, the total number of parameters for specifying this conditional is given by $$2^{n-1} -1$$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.
35+
To see why, let us consider the conditional for the last dimension, given by $$p(x_n \vert \mathbf{x}_{< n})$$. In order to fully specify this conditional, we need to specify a probability distribution for each of the $$2^{n-1}$$ configurations of the variables $$x_1, x_2, \ldots, x_{n-1}$$. For any one of the $$2^{n-1}$$ possible configurations of the variables, the probabilities should sum to one. Therefore, we need only one parameter for each configuration, so the total number of parameters for specifying this conditional is given by $$2^{n-1}$$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.
3636

3737
In an *autoregressive generative model*, the conditionals are specified as parameterized functions with a fixed number of parameters. That is, we assume the conditional distributions $$p(x_i \vert \mathbf{x}_{< i})$$ to correspond to a Bernoulli random variable and learn a function that maps the preceding random variables $$x_1, x_2, \ldots, x_{i-1}$$ to the
3838
mean of this distribution. Hence, we have

0 commit comments

Comments
 (0)