Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds support for running GRPO on IOI problems #495

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

adds support for running GRPO on IOI problems #495

wants to merge 7 commits into from

Conversation

guipenedo
Copy link

No description provided.

@guipenedo guipenedo marked this pull request as ready for review March 9, 2025 21:28
@guipenedo
Copy link
Author

guipenedo commented Mar 11, 2025

PR should be good, but something in setup.py seems to no longer work and is breaking the tests cc @edbeeching

@@ -370,7 +371,8 @@ def evaluate_code(code, test_cases):
for code, info in zip(code_snippets, verification_info)
]
try:
rewards = run_async_from_sync(scripts, verification_info["language"])
loop = _init_event_loop()
rewards = loop.run_until_complete(run_e2b_async(scripts, verification_info["language"]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is it better to just use asyncio.run(run_e2b_async(scripts, verification_info["language"])) ? Then we can drop the loop = _init_event_loop() line as this is handled by asyncio

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncio.run is only meant to be called on the top level entry point (typically a main()) see the docs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can refactor a bit and rename _init_event_loop to get_event_loop and then just have

rewards = get_event_loop().run_until_complete...

if you prefer, but, using run or just creating and destroying event loops all the time isn't a good idea

Copy link
Collaborator

@edbeeching edbeeching Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok, leave as is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could test a bit to be sure but in theory each successive call to the reward function should reuse the existing loop (so there will be a single lingering event loop at the very end)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at PR #504 ? I did some fixes / refactor there with asyncio (I don't claim this is the right way to do things!)

I am currently running the code reward on gold answers for the whole open-r1/verifiable-coding-problems-python_decontaminated (27k examples) with dataset.map on 16 procs and no issues so far, but I think each proc will have its own loop that is created and destroyed by asyncio.

Co-authored-by: Edward Beeching <[email protected]>
Copy link
Collaborator

@edbeeching edbeeching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

In #504 I added slow tests, that can be executed locally. Can you add a test that runs the reward function with some C++ code, I have been using gold solutions from the datasets we have been buiding for python reward func slow tests.

sbatch \
--job-name="piston-worker-$PORT" \
--export=ALL,PORT=$PORT \
/fsx/guilherme/piston/launch_single_piston.sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hard coded path won't work in the general case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, just like other paths in the slurm scripts you will need to adapt them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants