Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Imbalanced Loading #7250

Open
1 task done
WillDreamer opened this issue Mar 11, 2025 · 1 comment
Open
1 task done

GPU Imbalanced Loading #7250

WillDreamer opened this issue Mar 11, 2025 · 1 comment
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@WillDreamer
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

System Info

When I submit a training jobs as follows:
llamafactory-cli train \ --stage sft \ --do_train True \ --model_name_or_path Qwen/Qwen2-VL-7B-Instruct \ --preprocessing_num_workers 16 \ --finetuning_type lora \ --template qwen2_vl \ --rope_scaling linear \ --flash_attn auto \ --dataset_dir /data/LLaMA-Factory/data \ --dataset EMMA-mini \ --cutoff_len 4096 \ --learning_rate 5e-05 \ --num_train_epochs 3.0 \ --max_samples 100000 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 2 \ --lr_scheduler_type cosine \ --max_grad_norm 1.0 \ --logging_steps 5 \ --save_steps 100 \ --warmup_steps 0 \ --packing False \ --report_to \ --output_dir saves/Qwen2-VL-7B-Instruct/lora/train_2025-03-11-07-34-45 \ --pure_bf16 True \ --plot_loss True \ --trust_remote_code True \ --ddp_timeout 180000000 \ --include_num_input_tokens_seen True \ --optim adamw_torch \ --quantization_bit 4 \ --quantization_method bitsandbytes \ --double_quantization True \ --lora_rank 8 \ --lora_alpha 16 \ --lora_dropout 0 \ --lora_target all \ --deepspeed cache/ds_z2_config.json

The GPU states are not balanced. How could I deal with this?

Image

Reproduction

GPU imbalance

Others

No response

@WillDreamer WillDreamer added bug Something isn't working pending This problem is yet to be addressed labels Mar 11, 2025
@Kuangdd01
Copy link
Contributor

This issue seems similar to #5991.
In your case, batchsize_per_device is set to 1. GPU utilization will be different due to different sequence length on per gpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants