torchrec support on kvzch emb lookup module #4035

duduyi2013 · 2025-04-28T19:43:42Z

Summary:

Change logs

add ZeroCollisionKeyValueEmbedding emb lookup
address existing unit test missing for ssd offloading
add new ut for kv zch embedding module
add a temp hack solution for calculate bucket metadata
embedding updates, details illustrated below

#######################################################################
########################### embedding.py updates ##########################
#######################################################################

keep the original idea to init shardedTensor during training init
for kv zch table, the shardeTensor will be init using virtual size for metadata calculation, and skip actual tensor size check for ST init, this is needed as during training init, the table has 0 rows
the new tensor, weight_id will not be registered in the EC becuase its shape is changing in realtime, the weight_id tensor will be generated in post_state_dict hooks
the new tensor, bucket could be registered and preserved, but in this diff we keep it the same way as weight_id
in post state dict hook, we call get_named_split_embedding_weights_snapshot to get Tuple[table_name, weight(ST), weight_id(ST), bucket(ST)], all 3 tensors are return in the format of ST, and we will update destination with the returned ST directly
in pre_load_state_dict_hook, which is called upon load_state_dict(), we will skip all 3 tensors update, because the tensor assignment is done on the nn.module side, which doesn't support updating KVT through PMT. This is fine for now because, checkpoint loading will be done outside of the load_state_dict call, but we need future plans to make it work cohesively with other type of tensors

Differential Revision: D73567631

Summary: change list 1. add bucket concept into ssd tbe 2. update split_embedding_weights to make it return a tuple of 3 tensors(weight, weight_id, id_cnt_per_bucket) 3. add new ut for the key value embedding cases Differential Revision: D73274786

Summary: # Change logs 1. add ZeroCollisionKeyValueEmbedding emb lookup 2. address existing unit test missing for ssd offloading 3. add new ut for kv zch embedding module 4. add a temp hack solution for calculate bucket metadata 5. embedding updates, details illustrated below ####################################################################### ########################### embedding.py updates ########################## ####################################################################### 1. keep the original idea to init shardedTensor during training init 2. for kv zch table, the shardeTensor will be init using virtual size for metadata calculation, and skip actual tensor size check for ST init, this is needed as during training init, the table has 0 rows 3. the new tensor, weight_id will not be registered in the EC becuase its shape is changing in realtime, the weight_id tensor will be generated in post_state_dict hooks 4. the new tensor, bucket could be registered and preserved, but in this diff we keep it the same way as weight_id 5. in post state dict hook, we call get_named_split_embedding_weights_snapshot to get Tuple[table_name, weight(ST), weight_id(ST), bucket(ST)], all 3 tensors are return in the format of ST, and we will update destination with the returned ST directly 6. in pre_load_state_dict_hook, which is called upon load_state_dict(), we will skip all 3 tensors update, because the tensor assignment is done [on the nn.module side](https://fburl.com/code/it5nior8), which doesn't support updating KVT through PMT. This is fine for now because, checkpoint loading will be done outside of the load_state_dict call, but we need future plans to make it work cohesively with other type of tensors Differential Revision: D73567631

netlify · 2025-04-28T19:44:04Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`26b0b47`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/680fda70d6988b0008722fca
😎 Deploy Preview	https://deploy-preview-4035--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2025-04-28T19:44:06Z

This pull request was exported from Phabricator. Differential Revision: D73567631

duduyi2013 added 2 commits April 28, 2025 12:43

facebook-github-bot added the cla signed label Apr 28, 2025

facebook-github-bot added the fb-exported label Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchrec support on kvzch emb lookup module #4035

torchrec support on kvzch emb lookup module #4035

duduyi2013 commented Apr 28, 2025

netlify bot commented Apr 28, 2025 •

edited

Loading

facebook-github-bot commented Apr 28, 2025

torchrec support on kvzch emb lookup module #4035

Are you sure you want to change the base?

torchrec support on kvzch emb lookup module #4035

Conversation

duduyi2013 commented Apr 28, 2025

Change logs

netlify bot commented Apr 28, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Apr 28, 2025

netlify bot commented Apr 28, 2025 •

edited

Loading