Skip to content

Ice velocity integration #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 25, 2025
Merged

Ice velocity integration #97

merged 16 commits into from
Apr 25, 2025

Conversation

albangossard
Copy link
Member

@albangossard albangossard commented Apr 9, 2025

This PR contains multiple changes:

  • When trying to integrate ice velocity into the loss for transient inversions, I discovered a bug when we call Results without the keyword arguments. In Julia, we cannot define a function like
function g(;a::Vector{Matrix{F}}=Vector{Matrix{F}}([])) where {F <: AbstractFloat}
    print(typeof(a))
end

because the keyword argument is not specialized during compilation. Such implementation would work if a was a positional argument. We didn't see this problem before because we were always providing a value to Results(...) for the keyword arguments whose default values depend on F.
In order to avoid such bugs in the future, I added a unit test that instantiates Results.

  • Changes to store the surface velocities with respect to time.
  • The light option of simulation was removed since we should use save_everystep instead.
  • The tests of the plotting functions are back in place and these functions were improved to have better visualization.
  • The surface velocity datacubes can now be gridded onto the glacier grid to be able to integrate them in the loss.
  • Names of the glaciers are used when they are available.
  • Add a dummy iceflow struct in order to be able to test Results instantiation and the plot functions.
  • There is a code to generate fake velocity datacubes within the CI so that it doesn't have to be committed or downloaded. This allows testing the gridding of the velocity datacubes onto the glacier grid.

@albangossard albangossard added the bug Something isn't working label Apr 9, 2025
@codecov-commenter
Copy link

codecov-commenter commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 85.00000% with 45 lines in your changes missing coverage. Please review.

Project coverage is 76.52%. Comparing base (14e936a) to head (c8a1ec4).

Files with missing lines Patch % Lines
src/simulations/results/results_utils.jl 0.00% 20 Missing ⚠️
src/simulations/results/results_plotting_utils.jl 89.28% 12 Missing ⚠️
src/glaciers/glacier/Glacier2D.jl 76.92% 3 Missing ⚠️
src/glaciers/data/SurfaceVelocityData.jl 0.00% 2 Missing ⚠️
src/glaciers/data/SurfaceVelocityData_utils.jl 97.89% 2 Missing ⚠️
src/glaciers/data/ThicknessData.jl 0.00% 2 Missing ⚠️
src/glaciers/glacier/glacier2D_utils.jl 94.11% 2 Missing ⚠️
src/setup/config.jl 83.33% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #97       +/-   ##
===========================================
+ Coverage   47.09%   76.52%   +29.43%     
===========================================
  Files          19       20        +1     
  Lines         879      997      +118     
===========================================
+ Hits          414      763      +349     
+ Misses        465      234      -231     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@albangossard albangossard force-pushed the iceVelocityIntegration branch 3 times, most recently from f9a2fb4 to f9967d9 Compare April 15, 2025 15:19
@albangossard albangossard added the enhancement New feature or request label Apr 23, 2025
@albangossard albangossard force-pushed the iceVelocityIntegration branch from 836245b to 3a50573 Compare April 23, 2025 14:06
@albangossard albangossard marked this pull request as ready for review April 23, 2025 14:56
Copy link
Member

@JordiBolibar JordiBolibar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! This is a lot of foundational work, it will be very cool to be able to leverage these datasets from now on.

I just have some minor comments and some questions (purely for my curiosity). Feel free to change whatever seems important and then we can merge.

@@ -13,6 +13,7 @@ CoordRefSystems = "b46f11dc-f210-4604-bfba-323c1ec968cb"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
DateFormats = "44557152-fe0a-4de1-8405-416d90313ce6"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already available from Rasters.jl. Did you add it just to have the necessary exported namespaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is just to access Dim in Sleipnir/test/surface_velocity.jl for example. If that bothers you we can probably do Rasters.Dim instead.

@@ -12,6 +12,7 @@ using JLD2
using Distributed
using Statistics
using CairoMakie
using Observables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? I've seen it's some sort of improved Ref.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in lines like

Observables.connect!(cb.height, @lift CairoMakie.Fixed($(viewport(ax.scene)).widths[2]))

whose purpose is to make the colobar have the same height as the heatmap we are plotting.

- `date1::Union{Vector{DateTime}, Nothing}`: First date of adquisition.
- `date2::Union{Vector{DateTime}, Nothing}`: Second date of adquisition.
- `date_error::Union{Vector{Day}, Vector{Millisecond}, Nothing}`: Error in `date`.
- `glacierGridded::Bool`: Whether the data have been gridded to the glacier grid or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear what you mean by this. Is there a case where the date is not gridded to the glacier grid? Like, do you use the same struct to store the original raw data and the final data in the glacier grid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we use the same struct to store the original raw data and the gridded ones. I don't think we need a specific struct to store the original raw data. Doing so will even come with many memory issues as here we load the data in a lazy mode only when we need them and many of the points in the original data aren't useful since they are outside of the glacier grid.
Most of the time, we probably won't use the raw data directly. However, it can be convenient in some cases for debugging or to visualize data of the netcdf files.

I changed the name to isGridGlacierAligned to make it clearer.


Constructor for ice surface velocity data based on Rabatel et. al (2023).


Important remarks:
- Projections in longitude and latitude assume we are working in the north hemisphere.
If working with south hemisphere glaciers, this needs to be changed.
- The error in velocity is unique per timestamp, rather than being pixel distributed.
- The error in the absolute velocities `vabs_error` is overestimated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this? Does this come from the paper(s)? And by oversestimated you mean that the error is larger than reported or lower than reported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this agreement about the northern hemisphere is quite strong. At the very least I would give a warning somehow (e.g. in case there is a way to infer that we are not in the northern hemisphere) and clearly document this in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning the errors, this is something that @facusapienza21 implemented, so I'm not 100% sure of the answer. I guess this is because for every point of the grid, we don't have the velocities for every time and we use an upper bound by taking for every grid point the maximum speed over time. In the end the error is overestimated. @facusapienza21 am I right?

As for the northern hemisphere restriction, actually this is something that I removed in this PR and the code now supports data in the southern hemisphere too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly what was the problem, but as long as you can specify if the projection is done in the north or south hemisphere it should be fine!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right now we can just do north hemisphere unless we change this code:

  # Spatial preprocessing
  proj_zone = ncgetatt(file, "mapping", "utm_zone_number")
  transform(X,Y) = Sleipnir.UTMercator(X, Y; zone=proj_zone, hemisphere=:north) 

Right now this is done for north hemisphere. I think this was also something from the dataset: since the dataset for now is just for the north hemisphere, it is not as clear what will be the argument in the future from the dataset to determine if the datacube is north or south hemisphere. This is a good question for Romain/Antoine, but for now I will keep the comment until we are sure the preprocessing will work with South data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I suggest we open an issue to keep track of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I already changed that line in the PR. The code is now:

hemisphere = (y[end]+y[begin])/2 >= 0 ? :north : :south
transform(X,Y) = Sleipnir.UTMercator(X, Y; zone=Int(params_projection["zone"]), hemisphere=hemisphere)

Since in the UTM coordinate system, y corresponds to the northing, then y=0 corresponds to the equator (I double checked this numerically).

If you think there can be bugs, we can add an assert, put back the remark in the documentation and create an issue for that, but I think that this code will work with dataset in the south hemisphere.

compute_vabs_error::Bool=true
) where {G <: AbstractGlacier, VM <: VelocityMapping}

velRast = isa(raster, String) ? RasterStack(raster, lazy=true) : raster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any penalty or downside in using the Rasters in lazy mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience it's slower when we read the data. But the most important point is that for one of the datacubes I've been playing with, it is just impossible to load it a non lazy way. In the netcdf file data are represented in a sparse way on disk, but once it gets loaded into memory, everything is dense and this represents more than 15GB of RAM!

abstract type VelocityMapping end

"""
MeanDateVelocityMapping <: VelocityMapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused to see mentioned here the iceflow model for a tool which in theory only serves to map the grid from the observations to the grid we use the model the glacier. I understand that the way we associate that to a given date will impact our interpretation of how we can compare it to the model, but this seems overly rigid. The fact that we know the start and end date of the observations used to compute the velocity is very useful and can be used when comparing it to model outputs.

There seems to be a big gap in terms of complexity between the two mapping approaches. Here's a suggestion: what about enabling the addition of a more data points in the meanDateVelocityMapping to go beyond just the middle date? If we sample more data points from outputs of the iceflow model and we average them, it should be increasingly robust. This should be too computationally expensive, it would just imply storing a denser solution and then averaging over an increased number of model outputs. The number of model outputs used could be somehow proportional to the length of the time window used in the calculation of the surface velocity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there are two different things that I mixed together: the spatial interpolation and the temporal interpolation. This is why the ice flow model is mentioned here, but we don't even care about it in Sleipnir. These are just structs to define what flavors of spatio-temporal mappings are implemented.

As discussed together today, I agree that these approaches are the two extremes in terms of complexity and we'll likely implement a simplification of the second one. We could either implement that in IntegratedTrajectoryMapping by using several hyper-parameters for allows reducing the number of time steps that are used, or implement it in another struct that is a subtype of VelocityMapping.

We'll probably implement these mappings in other PRs. All of this is exciting and I can't wait to start to play with these mappings!

if isdir(prepro_dir)
daysSinceLastDownload = (Dates.now() - Dates.unix2datetime(mtime(prepro_dir)))/Day(1)
if daysSinceLastDownload > 1
# Re-download data if older than one day
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a bit extreme? Why should files change that much after just one day?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you switch to my branch and you don't download the data, the code will crash. This is just to ensure that this situation doesn't happen and that you have the most up-to-date preprocessed data when you switch to a version of the code that needs it. In production we can probably comment these lines but for development this makes things so much easier. I would be happy to find a workaround as I agree that it's a bit extreme for most users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this is now doable because we just have few data in the google drive. If we implemented this with the Bremen server would it attempt to download the whole thing? Or just the necessary gdirs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would attempt to download the whole thing. But a cleaner way to do that would be to compute a checksum locally (or just read a version number) and compare it with a checksum on the server. If the two doesn't match, then we download the data. We could also do that silently with a try/catch so that if the user doesn't have an internet connection, it skips the downloading and uses the local version.
I created an issue for this specific task: #104

@facusapienza21
Copy link
Member

This looks great @albangossard !!! Thank you for the work! I left comment in the review.

I have a more general question. I think there are two different approaches in how one could handle the fact that PDE simulations are gridded while the observations are not:

  • Grid the observations
  • Run your PDE model in the grid and interpolate this to evaluate in the places where you have data

In this PR, @albangossard you implemented the first one, right? So now all the dataset is gridded in a raster with the same size than the target glacier. However, in principle the second approach is also valid, and I priori I was thinking in moving forward with doing that: just run the PDE on the grid and then make some interpolation to evaluate where there is data.

For me the second approach was a bit better since it leaves the data unaltered, while it is the numerical model the one that needs to adapt to how the data is spatially/temporally distributed. Furthermore, to evaluate a loss function we don't need the data to be gridded "in time" (I agree that in space this makes more sense), so there is no reason to pack the data in discrete intervals of time: all this can be handled by the model itself when computing the loss function (a numerical integral) or the gradient/adjoint (another numerical integral).

We can definitely try this implementation first, but in principle both approaches are valid. Do you have any thoughts on this @albangossard or @JordiBolibar ?

Copy link
Member

@facusapienza21 facusapienza21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't fully undertand how the review works, because I always leave comments and add things in the code but it does not show up as "review". I am adding this now so it shows as I did the review (which I did before).

Comment on lines +31 to +33
vx::Union{Vector{Matrix{F}}, Nothing}
vy::Union{Vector{Matrix{F}}, Nothing}
vabs::Union{Vector{Matrix{F}}, Nothing}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Just curious: any reason why a vector of matrices is better than just a 3D tensor?

Copy link
Member

@JordiBolibar JordiBolibar Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nicer to handle. Since two dimensions are spatial and one is temporal, I feel it's much nicer to iterate through time and get back spatial grids. Otherwise the user needs to know which of the 3 dimensions is time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the origin it was because the plotting functions assume that spatio-temporal data are a vector of matrix. I don't have a strong opinion except that it should be uniform across the different structs.


Constructor for ice surface velocity data based on Rabatel et. al (2023).


Important remarks:
- Projections in longitude and latitude assume we are working in the north hemisphere.
If working with south hemisphere glaciers, this needs to be changed.
- The error in velocity is unique per timestamp, rather than being pixel distributed.
- The error in the absolute velocities `vabs_error` is overestimated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right now we can just do north hemisphere unless we change this code:

  # Spatial preprocessing
  proj_zone = ncgetatt(file, "mapping", "utm_zone_number")
  transform(X,Y) = Sleipnir.UTMercator(X, Y; zone=proj_zone, hemisphere=:north) 

Right now this is done for north hemisphere. I think this was also something from the dataset: since the dataset for now is just for the north hemisphere, it is not as clear what will be the argument in the future from the dataset to determine if the datacube is north or south hemisphere. This is a good question for Romain/Antoine, but for now I will keep the comment until we are sure the preprocessing will work with South data.

Copy link
Member Author

@albangossard albangossard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your quick and complete reviews @JordiBolibar @facusapienza21!

I'm answering the general comment here.

In this PR, @albangossard you implemented the first one, right? So now all the dataset is gridded in a raster with the same size than the target glacier. However, in principle the second approach is also valid, and I priori I was thinking in moving forward with doing that: just run the PDE on the grid and then make some interpolation to evaluate where there is data.

The main reason I implemented the first version and not the second is that it's more computationally intensive to grid the data on-the-fly. Right now it already takes a few seconds, and the interpolation is really simple. Although currently some of the time is spent reading on disk and that this is something that could be loaded in advance, I expect an on-the-fly gridding to significantly slow down the simulation. We would also have to truncate the observations to spare memory since the datacube is spatially larger than the size of the glacier (about only 1/4th of the spatial grid is useful for the Bossons glacier).

Furthermore, to evaluate a loss function we don't need the data to be gridded "in time" [...], so there is no reason to pack the data in discrete intervals of time: all this can be handled by the model itself when computing the loss function [...].

Note that with the current implementation, even if data are spatially gridded, there is no restriction about the time. The data are available at the same time steps as the ones of the datacube and we can i) take them all and store the dense solution, or ii) pick a few ones and compare them to the simulated speeds, or iii) average them as suggested by @JordiBolibar. All of this can be done afterwards by playing only with the time component.
I see these mappings in a very similar way as the gradient computation : we can split how the mapping is applied in time and in space.

We can definitely try this implementation first, but in principle both approaches are valid. Do you have any thoughts on this @albangossard or @JordiBolibar ?

From a mathematical perspective, I don't see any difference between the two as soon as the interpolation is properly implemented, which is obviously not the case right now since we're using the nearest points.

To end this remark, I agree that we will probably also implement the second option, at least to compare the two approaches. But in the long term my bet is that in practice we'll stick to the first version just because of the computational cost.
I'm happy to continue discussing about this! :)

if isdir(prepro_dir)
daysSinceLastDownload = (Dates.now() - Dates.unix2datetime(mtime(prepro_dir)))/Day(1)
if daysSinceLastDownload > 1
# Re-download data if older than one day
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would attempt to download the whole thing. But a cleaner way to do that would be to compute a checksum locally (or just read a version number) and compare it with a checksum on the server. If the two doesn't match, then we download the data. We could also do that silently with a try/catch so that if the user doesn't have an internet connection, it skips the downloading and uses the local version.
I created an issue for this specific task: #104

compute_vabs_error::Bool=true
) where {G <: AbstractGlacier, VM <: VelocityMapping}

velRast = isa(raster, String) ? RasterStack(raster, lazy=true) : raster
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience it's slower when we read the data. But the most important point is that for one of the datacubes I've been playing with, it is just impossible to load it a non lazy way. In the netcdf file data are represented in a sparse way on disk, but once it gets loaded into memory, everything is dense and this represents more than 15GB of RAM!

@@ -12,6 +12,7 @@ using JLD2
using Distributed
using Statistics
using CairoMakie
using Observables
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in lines like

Observables.connect!(cb.height, @lift CairoMakie.Fixed($(viewport(ax.scene)).widths[2]))

whose purpose is to make the colobar have the same height as the heatmap we are plotting.

@@ -13,6 +13,7 @@ CoordRefSystems = "b46f11dc-f210-4604-bfba-323c1ec968cb"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
DateFormats = "44557152-fe0a-4de1-8405-416d90313ce6"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is just to access Dim in Sleipnir/test/surface_velocity.jl for example. If that bothers you we can probably do Rasters.Dim instead.

- `date1::Union{Vector{DateTime}, Nothing}`: First date of adquisition.
- `date2::Union{Vector{DateTime}, Nothing}`: Second date of adquisition.
- `date_error::Union{Vector{Day}, Vector{Millisecond}, Nothing}`: Error in `date`.
- `glacierGridded::Bool`: Whether the data have been gridded to the glacier grid or not.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we use the same struct to store the original raw data and the gridded ones. I don't think we need a specific struct to store the original raw data. Doing so will even come with many memory issues as here we load the data in a lazy mode only when we need them and many of the points in the original data aren't useful since they are outside of the glacier grid.
Most of the time, we probably won't use the raw data directly. However, it can be convenient in some cases for debugging or to visualize data of the netcdf files.

I changed the name to isGridGlacierAligned to make it clearer.

abstract type VelocityMapping end

"""
MeanDateVelocityMapping <: VelocityMapping
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there are two different things that I mixed together: the spatial interpolation and the temporal interpolation. This is why the ice flow model is mentioned here, but we don't even care about it in Sleipnir. These are just structs to define what flavors of spatio-temporal mappings are implemented.

As discussed together today, I agree that these approaches are the two extremes in terms of complexity and we'll likely implement a simplification of the second one. We could either implement that in IntegratedTrajectoryMapping by using several hyper-parameters for allows reducing the number of time steps that are used, or implement it in another struct that is a subtype of VelocityMapping.

We'll probably implement these mappings in other PRs. All of this is exciting and I can't wait to start to play with these mappings!

vy = [eltype(x).(replace(vy[:,:,i], missing => NaN)) for i in 1:size(vy, 3)]
vabs = [eltype(x).(replace(vabs[:,:,i], missing => NaN)) for i in 1:size(vabs, 3)]

if mapToGlacierGrid
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, I merged the two if statements in a270f5f.

Copy link
Member

@facusapienza21 facusapienza21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good! Feel free to merge and probably bump version to start propagarting changes to Huginn and ODINN.

Copy link
Member

@JordiBolibar JordiBolibar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, all good for me! :)

@albangossard albangossard merged commit de9c78f into main Apr 25, 2025
4 checks passed
@albangossard albangossard deleted the iceVelocityIntegration branch April 25, 2025 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants