Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConsolidateMetadata2 #678

Merged

Conversation

norlandrhagen
Copy link
Contributor

ConsolidateMetadata doesn't seem to be creating a .zmetadata in the reference recipe tests.

Based on Martin Durants comments here: fsspec/kerchunk#240 (comment), this PR updates how the .zmetadata is being created.

Some open questions:

  • It seemed like the previous tests of zarr.open_consolidated were working...without a .zmetadata file?
  • This method path.fs.save_json(ref_path) calls save_json`. How do we modify the parquet stores?

cc @abarciauskas-bgse @ranchodeluxe

@abarciauskas-bgse
Copy link
Contributor

Thanks for looking at this @norlandrhagen ! I was just reading about the parquet implementation and noticed it says:

For now, kerchunk.combine.MultiZarrToZarr only operates on JSON/dict input. Therefore, refs_to_dataframe can only be used on the final output reference set.

Since consolidate_metadata also seems to assume JSON/dict input I would not expect to be able to pass geoparquet to consolidate_metadata either.

Did you consolidate metadata before converting to geoparquet for the CMIP6 dataset you worked on?

@norlandrhagen
Copy link
Contributor Author

Update:

Thanks @abarciauskas-bgse & @ranchodeluxe for talking through this!

According to discussion in this PR: #675. It seems like there isn't really a point to consolidate the metadata for a Kerchunk reference.

  • We should probably remove any Kerchunk specific logic for consolidating references.
  • Consolidate_Metadata works on Kerchunk json references (Although it's probably not needed at all)
  • ConsolidateMetadata fails on Kerchunk parquet references.

@ranchodeluxe
Copy link
Contributor

ranchodeluxe commented Jan 31, 2024

According to discussion in this PR: #675. It seems like there isn't really a point to consolidate the metadata for a Kerchunk reference.

* We should probably remove any Kerchunk specific logic for consolidating references.

* Consolidate_Metadata works on Kerchunk json references (Although it's probably not needed at all)

* ConsolidateMetadata fails on Kerchunk parquet references.

(just adding these bullets also)

  • Include the fix for write_combined_reference regarding parquet files

  • Make sure ConsolidateMetadata takes target_root and passes along fsspec kwargs

@norlandrhagen
Copy link
Contributor Author

  • Consolidating references for Kerchunk recipes have been removed.
  • Fixed a major bug where we were writing json files with a .parquet suffix.
  • Updated the LazyReferenceMapper.create method to kwargs.

@norlandrhagen norlandrhagen added the test-integration Apply this label to run integration tests on a PR. label Feb 1, 2024
@ranchodeluxe ranchodeluxe self-requested a review February 2, 2024 14:44
Copy link
Contributor

@ranchodeluxe ranchodeluxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @norlandrhagen 🥳 Can't wait to get this merged so I can double check if my assumption about fsspec auth args is correct or not

@norlandrhagen norlandrhagen marked this pull request as ready for review February 2, 2024 16:23
@norlandrhagen norlandrhagen changed the title WIP - [BugFix Attempt] ConsolidateMetadata ConsolidateMetadata2 Feb 2, 2024
@norlandrhagen
Copy link
Contributor Author

@cisaacstern would you mind taking a look at this when you get a chance?

@norlandrhagen norlandrhagen merged commit 684f3c7 into pangeo-forge:main Feb 6, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-integration Apply this label to run integration tests on a PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants