adds support for running GRPO on IOI problems #495

guipenedo · 2025-03-09T13:53:54Z

No description provided.

guipenedo · 2025-03-11T12:55:51Z

PR should be good, but something in setup.py seems to no longer work and is breaking the tests cc @edbeeching

edbeeching · 2025-03-13T14:27:53Z

src/open_r1/rewards.py

@@ -370,7 +371,8 @@ def evaluate_code(code, test_cases):
        for code, info in zip(code_snippets, verification_info)
    ]
    try:
-        rewards = run_async_from_sync(scripts, verification_info["language"])
+        loop = _init_event_loop()
+        rewards = loop.run_until_complete(run_e2b_async(scripts, verification_info["language"]))


I think is it better to just use asyncio.run(run_e2b_async(scripts, verification_info["language"])) ? Then we can drop the loop = _init_event_loop() line as this is handled by asyncio

asyncio.run is only meant to be called on the top level entry point (typically a main()) see the docs

I can refactor a bit and rename _init_event_loop to get_event_loop and then just have

rewards = get_event_loop().run_until_complete...

if you prefer, but, using run or just creating and destroying event loops all the time isn't a good idea

It is ok, leave as is.

we could test a bit to be sure but in theory each successive call to the reward function should reuse the existing loop (so there will be a single lingering event loop at the very end)

Can you take a look at PR #504 ? I did some fixes / refactor there with asyncio (I don't claim this is the right way to do things!)

I am currently running the code reward on gold answers for the whole open-r1/verifiable-coding-problems-python_decontaminated (27k examples) with dataset.map on 16 procs and no issues so far, but I think each proc will have its own loop that is created and destroyed by asyncio.

src/open_r1/rewards.py

Co-authored-by: Edward Beeching <[email protected]>

edbeeching

LGTM

In #504 I added slow tests, that can be executed locally. Can you add a test that runs the reward function with some C++ code, I have been using gold solutions from the datasets we have been buiding for python reward func slow tests.

min-xu-et · 2025-03-13T17:01:33Z

slurm/piston/launch_piston_workers.sh

+    sbatch \
+        --job-name="piston-worker-$PORT" \
+        --export=ALL,PORT=$PORT \
+        /fsx/guilherme/piston/launch_single_piston.sh


this hard coded path won't work in the general case?

no, just like other paths in the slurm scripts you will need to adapt them

guipenedo added 3 commits March 9, 2025 13:53

adds support for running GRPO on IOI problems

16cdbcd

nit

6093c33

bugfixes + recipe

927206c

guipenedo marked this pull request as ready for review March 9, 2025 21:28

guipenedo added 3 commits March 11, 2025 10:43

added piston info and readme changes

15a7205

readme updates

8ec0eeb

run isort to fix checks

6bed34d

edbeeching reviewed Mar 13, 2025

View reviewed changes

src/open_r1/rewards.py Show resolved Hide resolved

edbeeching reviewed Mar 13, 2025

View reviewed changes

src/open_r1/rewards.py Show resolved Hide resolved

edbeeching reviewed Mar 13, 2025

View reviewed changes

src/open_r1/rewards.py Show resolved Hide resolved

edbeeching reviewed Mar 13, 2025

View reviewed changes

src/open_r1/rewards.py Outdated Show resolved Hide resolved

Update src/open_r1/rewards.py

a959519

Co-authored-by: Edward Beeching <[email protected]>

edbeeching approved these changes Mar 13, 2025

View reviewed changes

min-xu-et reviewed Mar 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds support for running GRPO on IOI problems #495

adds support for running GRPO on IOI problems #495

guipenedo commented Mar 9, 2025

guipenedo commented Mar 11, 2025 •

edited

Loading

edbeeching Mar 13, 2025

guipenedo Mar 13, 2025

guipenedo Mar 13, 2025

edbeeching Mar 13, 2025 •

edited

Loading

guipenedo Mar 13, 2025

edbeeching Mar 13, 2025

edbeeching left a comment •

edited

Loading

min-xu-et Mar 13, 2025

guipenedo Mar 13, 2025

adds support for running GRPO on IOI problems #495

Are you sure you want to change the base?

adds support for running GRPO on IOI problems #495

Conversation

guipenedo commented Mar 9, 2025

guipenedo commented Mar 11, 2025 • edited Loading

edbeeching Mar 13, 2025

Choose a reason for hiding this comment

guipenedo Mar 13, 2025

Choose a reason for hiding this comment

guipenedo Mar 13, 2025

Choose a reason for hiding this comment

edbeeching Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

guipenedo Mar 13, 2025

Choose a reason for hiding this comment

edbeeching Mar 13, 2025

Choose a reason for hiding this comment

edbeeching left a comment • edited Loading

Choose a reason for hiding this comment

min-xu-et Mar 13, 2025

Choose a reason for hiding this comment

guipenedo Mar 13, 2025

Choose a reason for hiding this comment

guipenedo commented Mar 11, 2025 •

edited

Loading

edbeeching Mar 13, 2025 •

edited

Loading

edbeeching left a comment •

edited

Loading