Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restart support for masked spaces #2212

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Mar 6, 2025

Closes #2216

@charleskawczynski charleskawczynski changed the title Add IO support for space masks Add restart support for masked spaces Mar 7, 2025
@charleskawczynski charleskawczynski force-pushed the ck/read_data branch 4 times, most recently from 4a97767 to d8c730e Compare March 9, 2025 14:25
@charleskawczynski charleskawczynski marked this pull request as ready for review March 9, 2025 16:15
!(topology isa Topologies.Topology2D && !(writer.context isa ClimaComms.SingletonCommsContext))
```
"""
function _write_mpi!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through the PR in details yet, but this caught my eye. When InputOutput writes other objects (including Fields and their associated data) it can directly handle the distributed case. Why is there a need to distinguish the MPI and non-MPI case here?

Also,

!(topology isa Topologies.Topology2D && !(writer.context isa ClimaComms.SingletonCommsContext))

The double negation makes this hard to understand. It seems that this says that the function should be used when the topology is not 2D and it is singleton, the opposite of what the function name suggests. Is this a typo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked at the existing code and think I have answered my own question about the need for two functions: they are not really handled directly and there are still if-else statements

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wish there was a better way, but you really need to know topology information to handle datalayout IO correctly. I split the method into two pieces in case either of them could be reused. For now, I'm only using these methods for the mask but we could eventually reuse them in other places to reduce some of the duplicate logic.

I think happy to remove the double negative from the docs, I tend to avoid them, but I think I just did that to somewhat match the source code.

@juliasloan25
Copy link
Member

Could you add some notes in the PR description about the main changes being made here to support restarts with masked spaces? It would probably be good to add something like that to the NEWS too

Verified

This commit was signed with the committer’s verified signature.
keithamus Keith Cirkel
Copy link
Member

@Sbozzolo Sbozzolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I left minor comments

"grid_mask",
write!(writer, grid.mask, Spaces.topology(grid)),
)
# write!(writer, grid.mask, Spaces.topology(grid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# write!(writer, grid.mask, Spaces.topology(grid))

)
group = create_group(writer.file, "grid_mask/$name")
write!(writer, group, mask.is_active, "is_active", topology)
name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name
return name

topology::Topologies.AbstractTopology,
)

Write an object of type 'AbstractData' and name 'name' to the HDF5 file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Write an object of type 'AbstractData' and name 'name' to the HDF5 file.
Write an object of type `AbstractData` and name `name` to the HDF5 file.

This method should be used when

```julia
!(topology isa Topologies.Topology2D && !(writer.context isa ClimaComms.SingletonCommsContext))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this restriction on Topology 2D needed because we don't support MPI for Topology1D?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this conditional is incorrect for this case and should just be

topology isa Topologies.Topology2D && !(writer.context isa ClimaComms.SingletonCommsContext)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this restriction on Topology 2D needed because we don't support MPI for Topology1D?

I think that is true, but I used this condition mostly to preserve the existing behavior

This method should be used when

```julia
!(topology isa Topologies.Topology2D && !(writer.context isa ClimaComms.SingletonCommsContext))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this hard to read (because of the double/nested negation). I'd suggest

Suggested change
!(topology isa Topologies.Topology2D && !(writer.context isa ClimaComms.SingletonCommsContext))
is_distributed = !(writer.context isa ClimaComms.SingletonCommsContext)
!is_distributed || !(topology isa Topologies.Topology2D)

function _write!(group, values::DataLayouts.AbstractData, name::AbstractString;)
h_dim = DataLayouts.h_dim(DataLayouts.singleton(values))
array = parent(values)
# dataset = write_plain_array!(writer, array, "data/$name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# dataset = write_plain_array!(writer, array, "data/$name")

Comment on lines -72 to +87
context = ClimaComms.SingletonCommsContext(device)
grid_topology = Topologies.Topology2D(context, mesh)
context = ClimaComms.context(device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test only works for SingletonCommsContext.

If you use MPI, mktempfile will create two different files on in the /tmp/ of different nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I need to think about this.

Spaces.set_mask!(space) do coords
rand() > 0.5
end
is_active₀ = Spaces.get_mask(space).is_active
Copy link
Member

@Sbozzolo Sbozzolo Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (the is_active₀) doesn't seem to be used anywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I think this is sort of a rebase artifact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add restart support for masked spaces
3 participants