Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

Implement the DLRM model #344

Merged
merged 8 commits into from
Mar 25, 2020
Merged

Implement the DLRM model #344

merged 8 commits into from
Mar 25, 2020

Conversation

jordannad
Copy link
Contributor

No description provided.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@jordannad
Copy link
Contributor Author

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@xihui-wu xihui-wu self-assigned this Feb 26, 2020
@xihui-wu
Copy link
Contributor

Thanks for contributing this @jordannad. Two general questions regarding the model as I looked into PyTorch implementations:

  1. For MLP they have activations between the dense layers, any reason we omit it?
  2. Looks like there are two other OPs besides concat, should we support ?

@shabalind shabalind requested a review from xihui-wu March 11, 2020 17:44
As part of ensuring the train test is reliable, I have matched the initialization
of the reference implementation for the Embedding layers. Additionally, the DLRM
model is susceptible to a "bad initialization" that doesn't perfectly memorize
the single test minibatch. Although this is infrequent (~1 out of 50 test runs),
I have modified the tests to randomly re-initialize 5 times, ensuring the test
is approximately flaky with a probability of 3.2e-9 while still maintaining the
quality of the test (e.g. testing random initialization, etc). Finally, instead
of checking that loss drops below a particular value, the test checks that the
accuracy is 100%. This results in a faster stopping condition, and thus the
convergence test often runs in under 300ms on a laptop.
@jordannad
Copy link
Contributor Author

Please take a look? I believe I've addressed the comments. Thank you!

sparseInput: [Tensor<Int32>]
) -> Tensor<Float> {
precondition(denseInput.shape.last! == nDense)
assert(sparseInput.count == latentFactors.count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use precondition as well ?

Copy link
Contributor

@xihui-wu xihui-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment. Rest LGTM!

@dabrahams dabrahams merged commit 713bb8d into tensorflow:master Mar 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants