-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
内存溢出错误:CUDA 内存不足。尝试分配 108.00 MiB 的内存。GPU 0 的总内存容量为 23.68 GiB,其中空闲内存为 34.88 MiB。进程 1170126 正在使用 2.25 GiB 的内存。 #730
Comments
加上了:deepspeed还是报错 combine: True training_args: see
|
如果配置没有问题的话,你可以试着调小这些参数再试试,这些会导致显存占用过大
|
lora.yaml文件: combine: True training_args: see
|
(glm-4) ubuntu@c54:~/zch/glm-4/GLM-4-main/finetune_demo$ nvidia-smi
Tue Mar 11 15:31:20 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A5000 Off | 00000000:01:00.0 Off | Off |
| 30% 27C P8 7W / 230W | 1640MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 Off | 00000000:25:00.0 Off | Off |
| 30% 30C P8 6W / 230W | 2318MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A5000 Off | 00000000:41:00.0 Off | Off |
| 30% 27C P8 9W / 230W | 2318MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A5000 Off | 00000000:61:00.0 Off | Off |
| 30% 25C P8 9W / 230W | 4099MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA RTX A5000 Off | 00000000:81:00.0 Off | Off |
| 30% 26C P8 4W / 230W | 4463MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA RTX A5000 Off | 00000000:A1:00.0 Off | Off |
| 30% 26C P8 8W / 230W | 2318MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA RTX A5000 Off | 00000000:C1:00.0 Off | Off |
| 30% 24C P8 6W / 230W | 4069MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA RTX A5000 Off | 00000000:E1:00.0 Off | Off |
| 30% 24C P8 8W / 230W | 1640MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
(glm-4) ubuntu@c54:~/zch/glm-4/GLM-4-main/finetune_demo$
命令:(glm-4)ubuntu@c54:~/zch/glm-4/GLM-4-main/finetune_demo$ OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8 finetune.py data ./glm-4-9b-chat configs/lora.yaml
我有 8 块闲置的显卡,每块显卡的内存容量为 24GB。然而,当我使用上述命令运行模型时,出现了错误。能否有人帮帮我?
[rank6]: OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacity of 23.68 GiB of which 34.88 MiB is free. Process 1170126 has 2.25
[rank6]: GiB memory in use. Process 1177435 has 5.44 GiB memory in use. Process 1177429 has 5.44 GiB memory in use. Process 1177431 has 5.83 GiB memory in use. Process
[rank6]: 1177433 has 4.67 GiB memory in use. Of the allocated memory 5.23 GiB is allocated by PyTorch, and 4.81 MiB is reserved by PyTorch but unallocated. If reserved but
[rank6]: unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management
[rank6]: (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[rank2]:[W311 15:29:11.938241134 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more
lora.yaml:
data_config:
train_file: train.jsonl
val_file: dev.jsonl
test_file: test.jsonl
num_proc: 1
combine: True
freezeV: True
max_input_length: 2048
max_output_length: 2048
training_args:
see transformers.Seq2SeqTrainingArguments
output_dir: ./output
max_steps: 3000
needed to be fit for the dataset
learning_rate: 5e-4
settings for data loading
per_device_train_batch_size: 1
dataloader_num_workers: 16
remove_unused_columns: false
settings for saving checkpoints
save_strategy: steps
save_steps: 500
settings for logging
log_level: info
logging_strategy: steps
logging_steps: 10
settings for evaluation
per_device_eval_batch_size: 4
eval_strategy: steps
eval_steps: 500
settings for optimizer
adam_epsilon: 1e-6
uncomment the following line to detect nan or inf values
debug: underflow_overflow
predict_with_generate: true
see transformers.GenerationConfig
generation_config:
max_new_tokens: 512
set your absolute deepspeed path here
deepspeed: configs/ds_zero_3.json
peft_config:
peft_type: LORA
task_type: CAUSAL_LM
r: 8
lora_alpha: 32
lora_dropout: 0.1
target_modules: ["query_key_value"]
#target_modules: ["q_proj", "k_proj", "v_proj"] if model is glm-4-9b-chat-hf
The text was updated successfully, but these errors were encountered: