-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds support for running GRPO on IOI problems #495
base: main
Are you sure you want to change the base?
Conversation
PR should be good, but something in setup.py seems to no longer work and is breaking the tests cc @edbeeching |
@@ -370,7 +371,8 @@ def evaluate_code(code, test_cases): | |||
for code, info in zip(code_snippets, verification_info) | |||
] | |||
try: | |||
rewards = run_async_from_sync(scripts, verification_info["language"]) | |||
loop = _init_event_loop() | |||
rewards = loop.run_until_complete(run_e2b_async(scripts, verification_info["language"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is it better to just use asyncio.run(run_e2b_async(scripts, verification_info["language"]))
? Then we can drop the loop = _init_event_loop()
line as this is handled by asyncio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asyncio.run is only meant to be called on the top level entry point (typically a main()) see the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can refactor a bit and rename _init_event_loop
to get_event_loop
and then just have
rewards = get_event_loop().run_until_complete...
if you prefer, but, using run
or just creating and destroying event loops all the time isn't a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ok, leave as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could test a bit to be sure but in theory each successive call to the reward function should reuse the existing loop (so there will be a single lingering event loop at the very end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look at PR #504 ? I did some fixes / refactor there with asyncio (I don't claim this is the right way to do things!)
I am currently running the code reward on gold answers for the whole open-r1/verifiable-coding-problems-python_decontaminated
(27k examples) with dataset.map on 16 procs and no issues so far, but I think each proc will have its own loop that is created and destroyed by asyncio.
Co-authored-by: Edward Beeching <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In #504 I added slow tests, that can be executed locally. Can you add a test that runs the reward function with some C++ code, I have been using gold solutions from the datasets we have been buiding for python reward func slow tests.
sbatch \ | ||
--job-name="piston-worker-$PORT" \ | ||
--export=ALL,PORT=$PORT \ | ||
/fsx/guilherme/piston/launch_single_piston.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this hard coded path won't work in the general case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, just like other paths in the slurm scripts you will need to adapt them
No description provided.