You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compiling optimizer helps perf of Llama4 Scout Model
3.8 tokens_per_second -> 9 tokens_per_second (max value of tokens per second in the first ~10 iterations)
peak memory is the same
```
tune run --nproc_per_node 8 \
full_finetune_distributed \
--config recipes/configs/llama4/scout_17B_16E_full.yaml
```
PS:
Current repo compilation fails if to set `skip_rope_interval=4,`, have to test with `skip_rope_interval=None,`
[ghstack-poisoned]
0 commit comments