You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of ensuring the train test is reliable, I have matched the initialization
of the reference implementation for the Embedding layers. Additionally, the DLRM
model is susceptible to a "bad initialization" that doesn't perfectly memorize
the single test minibatch. Although this is infrequent (~1 out of 50 test runs),
I have modified the tests to randomly re-initialize 5 times, ensuring the test
is approximately flaky with a probability of 3.2e-9 while still maintaining the
quality of the test (e.g. testing random initialization, etc). Finally, instead
of checking that loss drops below a particular value, the test checks that the
accuracy is 100%. This results in a faster stopping condition, and thus the
convergence test often runs in under 300ms on a laptop.
0 commit comments