-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kerchunk WriteCombinedReference return fsstore #660
Kerchunk WriteCombinedReference return fsstore #660
Conversation
# fails due to: _pickle.PicklingError: Can't pickle <function drop_unknown | ||
# at 0x290e46a70>: attribute lookup drop_unknown on __main__ failed | ||
mzz_kwargs=dict(preprocess=drop_unknown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I haven't had success passing references to functions into transform. It seems to fail at compilation. I don't have a workaround. Maybe @cisaacstern does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope no secret solutions on my end. This definitely merits its own issue.
elif file_ext == ".parquet": | ||
|
||
# Creates empty parquet store to be written to | ||
if full_target.exists(output_file_name): | ||
full_target.rm(output_file_name, recursive=True) | ||
full_target.makedir(output_file_name) | ||
|
||
remote_protocol = _select_single_protocol(full_target) | ||
|
||
out = LazyReferenceMapper.create(refs_per_component, outpath, full_target.fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in love with this part, but I don't know enough about parquet + referencefs to do better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to update if you think of anything!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbquinlan I got rid of inferring the protocol in favor of explicitly passing it.
@@ -446,6 +447,33 @@ def expand(self, references: beam.PCollection) -> beam.PCollection: | |||
) | |||
|
|||
|
|||
@dataclass | |||
class WriteReference(beam.PTransform, ZarrWriterMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
|
||
store_name: str | ||
concat_dims: List[str] | ||
target_root: Union[str, FSSpecTarget, RequiredAtRuntimeDefault] = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to update the injections so that this gets updated from the config. That should fix the integration test failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! I updated injections.py to add WriteReference
. Any idea if any more is required for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that should have done it... not sure why it's still failing on that. You could just pass the value in WriteCombineReference
as a workaround, but ideally what you have should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can take a closer look tomorrow but off the cuff I wonder if the failure is because we need to bump the version of runner used in the integration tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bumped runner to the two latest tags. First kerchunk integration test is passing!
Getting a failure on the minio version though.
tests/test_integration.py::test_python_json_configs_identical PASSED
tests/test_integration.py::test_integration[local_confpath-hrrr-kerchunk-concat-valid-time] XFAIL
tests/test_integration.py::test_integration[local_confpath-hrrr-kerchunk-concat-step] PASSED
tests/test_integration.py::test_integration[local_confpath-gpcp-from-gcs-dynamic-chunks] PASSED
tests/test_integration.py::test_integration[local_confpath-terraclimate] XFAIL
tests/test_integration.py::test_integration[local_confpath-narr-opendap] XFAIL
tests/test_integration.py::test_integration[local_confpath-noaa-oisst] PASSED
tests/test_integration.py::test_integration[local_confpath-gpcp-from-gcs] PASSED
tests/test_integration.py::test_integration[minio_confpath-hrrr-kerchunk-concat-valid-time] XFAIL
tests/test_integration.py::test_integration[minio_confpath-hrrr-kerchunk-concat-step] FAILED
raise NoCredentialsError()\nRuntimeError: botocore.exceptions.NoCredentialsError: Unable to locate credentials [while running \'Create|OpenWithKerchunk|WriteCombinedReference|Test dataset/WriteCombinedReference/WriteReference/Map(write_combined_reference)\']\n\n').returncode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I'd guess something about remote_options/storage_options in the find reference file system not getting passed through to the result maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually great, insofar as it proves the value of the minio
test! So much better to catch this here, rather than merge this and only find out in prod! 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just rediscovered this bug since I didn't see the need to pass remote_options
or target_options
given we had dep injected target_root
.
So the breaking change between 0.10.4
and what this PR merged into main
is us removing this context manager that reused the auth credentials from dep injection
will file a bug later
@sbquinlan @cisaacstern got the tests passing! If either one of you would mind taking one last look at the PR, it would be appreciated. |
- name: 🔁 Setup Python | ||
id: setup-python | ||
uses: actions/setup-python@v4 | ||
|
||
- name: Setup conda | ||
uses: mamba-org/setup-micromamba@v1 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
cache: pip | ||
cache-dependency-path: pyproject.toml | ||
create-args: python==${{ matrix.python-version }} | ||
environment-name: pangeo-forge | ||
|
||
- name: Install pangeo-forge recipes and runner | ||
shell: bash -l {0} | ||
run: | | ||
python -m pip install ${{ matrix.runner-version }} | ||
python -m pip install -e ".[test,minio,grib]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are using pip to install GRIB dependencies, do we need to be using micro mamba? Can't we just keep Setup Python the way it was?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm somewhat surprised this works? The cf-grib
/eccodes
docs seems to suggest system-level dependencies need to be installed via conda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, just reverted that and the integration tests are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment about how we're bringing in cfgrib for the test.
- name: 🌈 Install pangeo-forge-recipes & pangeo-forge-runner | ||
shell: bash -l {0} | ||
run: | | ||
python -m pip install ${{ matrix.runner-version }} | ||
python -m pip install -e ".[test,minio]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the piecemeal review, @norlandrhagen.
Since we're not doing anything different re: pip, can we just preserve this block for a cleaner diff?
I would prefer to not support a user-facing grib
optional dependency (at least as part of this PR), as I'm concerned that without putting that in a conda environment we can really ensure that will work reliably in a cross-platform way.
So I'd prefer just a separate workflow step here called "install grib deps" or similar, and not to have those deps as part of our pyproject.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good call!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand.. I removed the grib deps from pyproject, reverted the install topython -m pip install -e ".[test,minio]"
and added another step in the CI to install the grib deps via pip.
Thanks for pushing this forward, @norlandrhagen, clearly we're very close. My comments thus far are on packaging/GH workflows. I haven't really digested the changes to the actual code. If @sbquinlan is able to take a look at that, great! If not, I can try to circle back to that section tomorrow. |
Thanks for the CI feedback so far! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've looked over the code and tests. This all looks good to me. Nice work piping the storage and remote options all the way through.
Thanks for the review @sbquinlan ! |
Awesome! If @sbquinlan is happy then I'm happy 😀 , great work @norlandrhagen and thanks for engaging so deeply on this, Sean. |
Thank you both for the patience! |
This PR is a clean-slate version of this PR: #636
It borrows heavily from @sbquinlan's excellent work in this PR. #654. Hopefully enabling that PR to proceed.
test_CombineReferences
is now passing