Skip to content

Latest commit

 

History

History
98 lines (48 loc) · 2.32 KB

mnn.layer.LogSoftmaxLayer.md

File metadata and controls

98 lines (48 loc) · 2.32 KB

class LogSoftmaxLayer

method __init__

__init__(*shape, axis=1)

method backward

backward(gradients)

For softmax function, when $i = k$, we have $\partial y_i / \partial x_k = y_i \cdot (1 - y_i)$ and when $i \not= k$, we have $\partial y_i / \partial x_k = - y_i y_k$

Therefore, for the log-softmax function, when $i = k$, we have

$$ \partial z_i / \partial x_k = y_i^{-1} \cdot y_i \cdot (1 - y_i) = 1 - y_k $$

and when $i \not= k$,

$$ \partial z_i / \partial x_k = y_i^{-1} \cdot (-y_i y_k) = - y_k $$

As a result, the Jacobian matrix

$$ J_x z = \begin{bmatrix} 1 - y_1 & - y_2 & ... & - y_n \\\ - y_1 & 1 - y_2 & ... & - y_n \\ \vdots & \ddots \\\ - y_1 & - y_2 & ... & 1 - y_n \end{bmatrix} $$

which can be seen as an identity matrix minus a "stacked" softmax vectors.

The gradient is, by definition,

$$ \nabla_x \ell = J^T_x z \cdot \nabla_z \ell $$


method forward

forward(inputs, feedbacks=None)

$$ z(x) = \log y(x) $$

where $y_i(x) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$


method stable_log_softmax

stable_log_softmax(inputs, axis)

Simplify:

$$ \begin{aligned} z_i(x) =& \log \frac{\exp(x_i - m)}{\sum_j \exp(x_j - m)} \\ =& x_i - m - \log(\sum_j \exp(x_j - m)) \\ \end{aligned} $$


method step

step(lr=0.01)