We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, I am getting the following error whenever I scale up training to 512 GPUs while using FSDP2 + AdamWFP8 + BF16 stochastic rounding:
torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method copy_(*(DTensor(local_tensor=FakeTensor(..., device='cuda:4', size=(253, 7168), dtype=torch.bfloat16), device_mesh=DeviceMesh('cuda', [4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 108, 116, 124, 132, 140, 148, 156, 164, 172, 180, 188, 196, 204, 212, 220, 228, 236, 244, 252, 260, 268, 276, 284, 292, 300, 308, 316, 324, 332, 340, 348, 356, 364, 372, 380, 388, 396, 404, 412, 420, 428, 436, 444, 452, 460, 468, 476, 484, 492, 500, 508], mesh_dim_names=('dp_shard_cp',)), placements=(Shard(dim=0),)), DTensor(local_tensor=FakeTensor(..., device='cuda:4', size=(253, 7168), dtype=torch.bfloat16), device_mesh=DeviceMesh('cuda', [4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 108, 116, 124, 132, 140, 148, 156, 164, 172, 180, 188, 196, 204, 212, 220, 228, 236, 244, 252, 260, 268, 276, 284, 292, 300, 308, 316, 324, 332, 340, 348, 356, 364, 372, 380, 388, 396, 404, 412, 420, 428, 436, 444, 452, 460, 468, 476, 484, 492, 500, 508], mesh_dim_names=('dp_shard_cp',)), placements=(Shard(dim=0),))), **{}): got RuntimeError('expand: attempting to expand a dimension of length 16192!') from user code: File "/home/federico/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torchao/prototype/low_bit_optim/adam.py", line 189, in single_param_adam p.copy_(_fp32_to_bf16_sr(p_f32)) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
Either scaling down the run, or using HSDP is a workaround to the problem, but not great.
The text was updated successfully, but these errors were encountered:
cc @weifengpy @gau-nernst
Sorry, something went wrong.
No branches or pull requests
Hello, I am getting the following error whenever I scale up training to 512 GPUs while using FSDP2 + AdamWFP8 + BF16 stochastic rounding:
Either scaling down the run, or using HSDP is a workaround to the problem, but not great.
The text was updated successfully, but these errors were encountered: