You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: improve the precision of the FusedAddRMSNormKernel function (#587)
When `sizeof(T) == 2`, the sum of the read `input` and `residual` (float
`x`) is split into two parts, high and low 16 bits, and saved to `input`
and `residual` respectively. Later, `input` and `residual` are read out
and combined to `x`, with the aim of improving the precision of the
subsequent `x * rms_rcp` operation.
Increase precision from 1e-2 to 1e-3.
0 commit comments