`class` `LogSoftmaxLayer`

`method` `init`

__init__(*shape, axis=1)

`method` `backward`

backward(gradients)

For softmax function, when $i = k$, we have $\partial y_i / \partial x_k = y_i \cdot (1 - y_i)$ and when $i \not= k$, we have $\partial y_i / \partial x_k = - y_i y_k$

Therefore, for the log-softmax function, when $i = k$, we have

$$ \partial z_i / \partial x_k = y_i^{-1} \cdot y_i \cdot (1 - y_i) = 1 - y_k $$

and when $i \not= k$,

$$ \partial z_i / \partial x_k = y_i^{-1} \cdot (-y_i y_k) = - y_k $$

As a result, the Jacobian matrix

$$ J_x z = \begin{bmatrix} 1 - y_1 & - y_2 & ... & - y_n \\\ - y_1 & 1 - y_2 & ... & - y_n \\ \vdots & \ddots \\\ - y_1 & - y_2 & ... & 1 - y_n \end{bmatrix} $$

which can be seen as an identity matrix minus a "stacked" softmax vectors.

The gradient is, by definition,

$$ \nabla_x \ell = J^T_x z \cdot \nabla_z \ell $$

`method` `forward`

forward(inputs, feedbacks=None)

$$ z(x) = \log y(x) $$

where $y_i(x) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$

`method` `stable_log_softmax`

stable_log_softmax(inputs, axis)

Simplify:

$$ \begin{aligned} z_i(x) =& \log \frac{\exp(x_i - m)}{\sum_j \exp(x_j - m)} \\ =& x_i - m - \log(\sum_j \exp(x_j - m)) \\ \end{aligned} $$

`method` `step`

step(lr=0.01)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mnn.layer.LogSoftmaxLayer.md

mnn.layer.LogSoftmaxLayer.md

`class` `LogSoftmaxLayer`

`method` `init`

`method` `backward`

`method` `forward`

`method` `stable_log_softmax`

`method` `step`

Files

mnn.layer.LogSoftmaxLayer.md

Latest commit

History

mnn.layer.LogSoftmaxLayer.md

File metadata and controls

class LogSoftmaxLayer

method __init__

method backward

method forward

method stable_log_softmax

method step

`class` `LogSoftmaxLayer`

`method` `init`

`method` `backward`

`method` `forward`

`method` `stable_log_softmax`

`method` `step`