[RFC] Pass the original input to all PP stages #1130

fegin · 2025-04-22T18:41:42Z

We need the original tokens to generate the document masks/block causal masks. Since TorchTitan currently let all ranks perform data loading, there will be no performance regressions.

This is required to support document masking attention with PP.

H-Huang

on PP side looks good. Does the current llama3 and 4 not support document masking? How come model changes are needed?

torchtitan/experiments/llama4/model/model.py

torchtitan/models/llama3/model.py

fegin · 2025-04-22T20:03:11Z

@H-Huang llama3 doesn't have this. And document masking + PP is one missing feature for llama4.

We need the original tokens to generate the document masks/block causal masks. Since TorchTitan currently let all ranks perform data loading, there will be no performance regressions.

tianyu-l

Left some nit comments.

CI is broken by compile + SAC. Please make sure this change works before merge.

tianyu-l · 2025-04-27T14:52:22Z

torchtitan/experiments/llama4/model/model.py

+                If pipeline parallelism is enabled, this will be the input token indices
+                for the ranks on the first pipeline stage. This will be the activation of the
+                previous pipeline stage if the current rank is not on the first stage.
+            input_batch (torch.Tensor): The input batch read from the dataloader.


please add a comment that -- this field is needed for non-first PP stages to obtain proper document masks

tianyu-l · 2025-04-27T14:57:06Z

torchtitan/train.py

@@ -351,7 +355,7 @@ def train_step(self, input_dict: dict[str, torch.Tensor], labels: torch.Tensor):
            # Non-PP forward / backward
            with self.train_context(optional_context_parallel_ctx):
                assert len(model_parts) == 1
-                pred = model_parts[0](inputs)
+                pred = model_parts[0](inputs, input_batch=inputs)


non PP branch looks a bit strange -- I slightly prefer the alternative way of making input_batch optional, and let init_attention_mask use tokens if input_batch is None. The idea is that non PP users see less universal usage of input_batch

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 22, 2025

fegin requested review from tianyu-l, H-Huang, wconstab, XilunWu and wwwjn April 22, 2025 18:41

H-Huang approved these changes Apr 22, 2025

View reviewed changes

torchtitan/experiments/llama4/model/model.py Show resolved Hide resolved

torchtitan/models/llama3/model.py Show resolved Hide resolved

fegin added 2 commits April 24, 2025 09:54

Pass the input to all PP stages

3fd6461

We need the original tokens to generate the document masks/block causal masks. Since TorchTitan currently let all ranks perform data loading, there will be no performance regressions.

docstring

502bfa9

fegin force-pushed the pp_cp branch from 2aa23cc to 502bfa9 Compare April 24, 2025 16:58

wconstab approved these changes Apr 24, 2025

View reviewed changes

tianyu-l approved these changes Apr 27, 2025

View reviewed changes

tianyu-l mentioned this pull request Apr 27, 2025

Llama 4 issue tracking #1118

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Pass the original input to all PP stages #1130

[RFC] Pass the original input to all PP stages #1130

fegin commented Apr 22, 2025 •

edited

Loading

H-Huang left a comment

fegin commented Apr 22, 2025

tianyu-l left a comment

tianyu-l Apr 27, 2025

tianyu-l Apr 27, 2025

[RFC] Pass the original input to all PP stages #1130

Are you sure you want to change the base?

[RFC] Pass the original input to all PP stages #1130

Conversation

fegin commented Apr 22, 2025 • edited Loading

H-Huang left a comment

Choose a reason for hiding this comment

fegin commented Apr 22, 2025

tianyu-l left a comment

Choose a reason for hiding this comment

tianyu-l Apr 27, 2025

Choose a reason for hiding this comment

tianyu-l Apr 27, 2025

Choose a reason for hiding this comment

fegin commented Apr 22, 2025 •

edited

Loading