+To see why, let us consider the conditional for the last dimension, given by $$p(x_n \vert \mathbf{x}_{< n})$$. In order to fully specify this conditional, we need to specify a probability distribution for each of the $$2^{n-1}$$ configurations of the variables $$x_1, x_2, \ldots, x_{n-1}$$. For any one of the $$2^{n-1}$$ possible configurations of the variables, the probabilities should sum to one. Therefore, we need only one parameter for each configuration, so the total number of parameters for specifying this conditional is given by $$2^{n-1}$$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.
0 commit comments