Reward Model 推理 #7212

SFTJBD · 2025-03-07T12:49:11Z

Reminder

I have read the above rules and searched the existing issues.

System Info

有没有具体的 reward model 在训练完成后的推理的示例。需要什么样的数据，用哪个指令可以进行rm的推理？我现在将lora模型加载后，计算的score不知道是不是正确。

Reproduction

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model_path_merge_rm, device_map="cpu")
model = AutoModelForCausalLMWithValueHead.from_pretrained(model)
vhead_params = load_valuehead_params(vhead_file)
model.load_state_dict(vhead_params, strict=False)

_, _, values = model(**inputs, output_hidden_states=True, return_dict=True, use_cache=False)
rewards = values.gather(dim=-1, index=(inputs["attention_mask"].sum(dim=-1, keepdim=True) - 1))

Others

No response

The text was updated successfully, but these errors were encountered:

SFTJBD added bug Something isn't working pending This problem is yet to be addressed labels Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward Model 推理 #7212

Reward Model 推理 #7212

SFTJBD commented Mar 7, 2025

Reward Model 推理 #7212

Reward Model 推理 #7212

Comments

SFTJBD commented Mar 7, 2025

Reminder

System Info

Reproduction

Others