Ice velocity integration #97

albangossard · 2025-04-09T20:31:51Z

This PR contains multiple changes:

When trying to integrate ice velocity into the loss for transient inversions, I discovered a bug when we call Results without the keyword arguments. In Julia, we cannot define a function like

function g(;a::Vector{Matrix{F}}=Vector{Matrix{F}}([])) where {F <: AbstractFloat}
    print(typeof(a))
end

because the keyword argument is not specialized during compilation. Such implementation would work if a was a positional argument. We didn't see this problem before because we were always providing a value to Results(...) for the keyword arguments whose default values depend on F.
In order to avoid such bugs in the future, I added a unit test that instantiates Results.

Changes to store the surface velocities with respect to time.
The light option of simulation was removed since we should use save_everystep instead.
The tests of the plotting functions are back in place and these functions were improved to have better visualization.
The surface velocity datacubes can now be gridded onto the glacier grid to be able to integrate them in the loss.
Names of the glaciers are used when they are available.
Add a dummy iceflow struct in order to be able to test Results instantiation and the plot functions.
There is a code to generate fake velocity datacubes within the CI so that it doesn't have to be committed or downloaded. This allows testing the gridding of the velocity datacubes onto the glacier grid.

codecov-commenter · 2025-04-09T20:55:24Z

Codecov Report

Attention: Patch coverage is 85.00000% with 45 lines in your changes missing coverage. Please review.

Project coverage is 76.52%. Comparing base (14e936a) to head (c8a1ec4).

Files with missing lines	Patch %	Lines
src/simulations/results/results_utils.jl	0.00%	20 Missing ⚠️
src/simulations/results/results_plotting_utils.jl	89.28%	12 Missing ⚠️
src/glaciers/glacier/Glacier2D.jl	76.92%	3 Missing ⚠️
src/glaciers/data/SurfaceVelocityData.jl	0.00%	2 Missing ⚠️
src/glaciers/data/SurfaceVelocityData_utils.jl	97.89%	2 Missing ⚠️
src/glaciers/data/ThicknessData.jl	0.00%	2 Missing ⚠️
src/glaciers/glacier/glacier2D_utils.jl	94.11%	2 Missing ⚠️
src/setup/config.jl	83.33%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #97       +/-   ##
===========================================
+ Coverage   47.09%   76.52%   +29.43%     
===========================================
  Files          19       20        +1     
  Lines         879      997      +118     
===========================================
+ Hits          414      763      +349     
+ Misses        465      234      -231

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…er2D + move to Rasters.jl to load surface velocity datacube + fix several bugs with projection

…ake datacube for testing purposes

JordiBolibar

Awesome job! This is a lot of foundational work, it will be very cool to be able to leverage these datasets from now on.

I just have some minor comments and some questions (purely for my curiosity). Feel free to change whatever seems important and then we can merge.

JordiBolibar · 2025-04-23T15:51:52Z

Project.toml

@@ -13,6 +13,7 @@ CoordRefSystems = "b46f11dc-f210-4604-bfba-323c1ec968cb"
 DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 DateFormats = "44557152-fe0a-4de1-8405-416d90313ce6"
 Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
+DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0"


This is already available from Rasters.jl. Did you add it just to have the necessary exported namespaces?

Yes this is just to access Dim in Sleipnir/test/surface_velocity.jl for example. If that bothers you we can probably do Rasters.Dim instead.

JordiBolibar · 2025-04-23T15:53:18Z

src/Sleipnir.jl

@@ -12,6 +12,7 @@ using JLD2
 using Distributed
 using Statistics
 using CairoMakie
+using Observables


What is this for? I've seen it's some sort of improved Ref.

It is used in lines like

Observables.connect!(cb.height, @lift CairoMakie.Fixed($(viewport(ax.scene)).widths[2]))

whose purpose is to make the colobar have the same height as the heatmap we are plotting.

JordiBolibar · 2025-04-23T15:57:21Z

src/glaciers/data/SurfaceVelocityData.jl

+- `date1::Union{Vector{DateTime}, Nothing}`: First date of adquisition.
+- `date2::Union{Vector{DateTime}, Nothing}`: Second date of adquisition.
+- `date_error::Union{Vector{Day}, Vector{Millisecond}, Nothing}`: Error in `date`.
+- `glacierGridded::Bool`: Whether the data have been gridded to the glacier grid or not.


It's not entirely clear what you mean by this. Is there a case where the date is not gridded to the glacier grid? Like, do you use the same struct to store the original raw data and the final data in the glacier grid?

Yes we use the same struct to store the original raw data and the gridded ones. I don't think we need a specific struct to store the original raw data. Doing so will even come with many memory issues as here we load the data in a lazy mode only when we need them and many of the points in the original data aren't useful since they are outside of the glacier grid.
Most of the time, we probably won't use the raw data directly. However, it can be convenient in some cases for debugging or to visualize data of the netcdf files.

I changed the name to isGridGlacierAligned to make it clearer.

JordiBolibar · 2025-04-23T15:58:34Z

src/glaciers/data/SurfaceVelocityData.jl


 Constructor for ice surface velocity data based on Rabatel et. al (2023).


 Important remarks:
- Projections in longitude and latitude assume we are working in the north hemisphere.
-  If working with south hemisphere glaciers, this needs to be changed.
 - The error in velocity is unique per timestamp, rather than being pixel distributed.
 - The error in the absolute velocities `vabs_error` is overestimated.


Why is this? Does this come from the paper(s)? And by oversestimated you mean that the error is larger than reported or lower than reported?

Also, this agreement about the northern hemisphere is quite strong. At the very least I would give a warning somehow (e.g. in case there is a way to infer that we are not in the northern hemisphere) and clearly document this in the docs.

Concerning the errors, this is something that @facusapienza21 implemented, so I'm not 100% sure of the answer. I guess this is because for every point of the grid, we don't have the velocities for every time and we use an upper bound by taking for every grid point the maximum speed over time. In the end the error is overestimated. @facusapienza21 am I right?

As for the northern hemisphere restriction, actually this is something that I removed in this PR and the code now supports data in the southern hemisphere too.

I don't remember exactly what was the problem, but as long as you can specify if the projection is done in the north or south hemisphere it should be fine!

I think right now we can just do north hemisphere unless we change this code:

# Spatial preprocessing proj_zone = ncgetatt(file, "mapping", "utm_zone_number") transform(X,Y) = Sleipnir.UTMercator(X, Y; zone=proj_zone, hemisphere=:north)

Right now this is done for north hemisphere. I think this was also something from the dataset: since the dataset for now is just for the north hemisphere, it is not as clear what will be the argument in the future from the dataset to determine if the datacube is north or south hemisphere. This is a good question for Romain/Antoine, but for now I will keep the comment until we are sure the preprocessing will work with South data.

OK, I suggest we open an issue to keep track of this.

In fact, I already changed that line in the PR. The code is now:

hemisphere = (y[end]+y[begin])/2 >= 0 ? :north : :south transform(X,Y) = Sleipnir.UTMercator(X, Y; zone=Int(params_projection["zone"]), hemisphere=hemisphere)

Since in the UTM coordinate system, y corresponds to the northing, then y=0 corresponds to the equator (I double checked this numerically).

If you think there can be bugs, we can add an assert, put back the remark in the documentation and create an issue for that, but I think that this code will work with dataset in the south hemisphere.

JordiBolibar · 2025-04-23T16:00:57Z

src/glaciers/data/SurfaceVelocityData_utils.jl

+    compute_vabs_error::Bool=true
+) where {G <: AbstractGlacier, VM <: VelocityMapping}
+
+    velRast = isa(raster, String) ? RasterStack(raster, lazy=true) : raster


Is there any penalty or downside in using the Rasters in lazy mode?

From my experience it's slower when we read the data. But the most important point is that for one of the datacubes I've been playing with, it is just impossible to load it a non lazy way. In the netcdf file data are represented in a sparse way on disk, but once it gets loaded into memory, everything is dense and this represents more than 15GB of RAM!

JordiBolibar · 2025-04-23T16:43:14Z

src/glaciers/data/SurfaceVelocityMapping.jl

+abstract type VelocityMapping end
+
+"""
+    MeanDateVelocityMapping <: VelocityMapping


I am a bit confused to see mentioned here the iceflow model for a tool which in theory only serves to map the grid from the observations to the grid we use the model the glacier. I understand that the way we associate that to a given date will impact our interpretation of how we can compare it to the model, but this seems overly rigid. The fact that we know the start and end date of the observations used to compute the velocity is very useful and can be used when comparing it to model outputs.

There seems to be a big gap in terms of complexity between the two mapping approaches. Here's a suggestion: what about enabling the addition of a more data points in the meanDateVelocityMapping to go beyond just the middle date? If we sample more data points from outputs of the iceflow model and we average them, it should be increasingly robust. This should be too computationally expensive, it would just imply storing a denser solution and then averaging over an increased number of model outputs. The number of model outputs used could be somehow proportional to the length of the time window used in the calculation of the surface velocity.

Well there are two different things that I mixed together: the spatial interpolation and the temporal interpolation. This is why the ice flow model is mentioned here, but we don't even care about it in Sleipnir. These are just structs to define what flavors of spatio-temporal mappings are implemented.

As discussed together today, I agree that these approaches are the two extremes in terms of complexity and we'll likely implement a simplification of the second one. We could either implement that in IntegratedTrajectoryMapping by using several hyper-parameters for allows reducing the number of time steps that are used, or implement it in another struct that is a subtype of VelocityMapping.

We'll probably implement these mappings in other PRs. All of this is exciting and I can't wait to start to play with these mappings!

src/glaciers/glacier/glacier2D_utils.jl

JordiBolibar · 2025-04-23T16:49:15Z

src/setup/config.jl

+    if isdir(prepro_dir)
+        daysSinceLastDownload = (Dates.now() - Dates.unix2datetime(mtime(prepro_dir)))/Day(1)
+        if daysSinceLastDownload > 1
+            # Re-download data if older than one day


Isn't this a bit extreme? Why should files change that much after just one day?

Well if you switch to my branch and you don't download the data, the code will crash. This is just to ensure that this situation doesn't happen and that you have the most up-to-date preprocessed data when you switch to a version of the code that needs it. In production we can probably comment these lines but for development this makes things so much easier. I would be happy to find a workaround as I agree that it's a bit extreme for most users.

OK, I think this is now doable because we just have few data in the google drive. If we implemented this with the Bremen server would it attempt to download the whole thing? Or just the necessary gdirs?

It would attempt to download the whole thing. But a cleaner way to do that would be to compute a checksum locally (or just read a version number) and compare it with a checksum on the server. If the two doesn't match, then we download the data. We could also do that silently with a try/catch so that if the user doesn't have an internet connection, it skips the downloading and uses the local version.
I created an issue for this specific task: #104

test/results.jl

test/surface_velocity.jl

facusapienza21 · 2025-04-23T23:43:57Z

This looks great @albangossard !!! Thank you for the work! I left comment in the review.

I have a more general question. I think there are two different approaches in how one could handle the fact that PDE simulations are gridded while the observations are not:

Grid the observations
Run your PDE model in the grid and interpolate this to evaluate in the places where you have data

In this PR, @albangossard you implemented the first one, right? So now all the dataset is gridded in a raster with the same size than the target glacier. However, in principle the second approach is also valid, and I priori I was thinking in moving forward with doing that: just run the PDE on the grid and then make some interpolation to evaluate where there is data.

For me the second approach was a bit better since it leaves the data unaltered, while it is the numerical model the one that needs to adapt to how the data is spatially/temporally distributed. Furthermore, to evaluate a loss function we don't need the data to be gridded "in time" (I agree that in space this makes more sense), so there is no reason to pack the data in discrete intervals of time: all this can be handled by the model itself when computing the loss function (a numerical integral) or the gradient/adjoint (another numerical integral).

We can definitely try this implementation first, but in principle both approaches are valid. Do you have any thoughts on this @albangossard or @JordiBolibar ?

facusapienza21

I still don't fully undertand how the review works, because I always leave comments and add things in the code but it does not show up as "review". I am adding this now so it shows as I did the review (which I did before).

facusapienza21 · 2025-04-23T23:14:58Z

src/glaciers/data/SurfaceVelocityData.jl

+    vx::Union{Vector{Matrix{F}}, Nothing}
+    vy::Union{Vector{Matrix{F}}, Nothing}
+    vabs::Union{Vector{Matrix{F}}, Nothing}


This looks good to me! Just curious: any reason why a vector of matrices is better than just a 3D tensor?

I think it's nicer to handle. Since two dimensions are spatial and one is temporal, I feel it's much nicer to iterate through time and get back spatial grids. Otherwise the user needs to know which of the 3 dimensions is time.

At the origin it was because the plotting functions assume that spatio-temporal data are a vector of matrix. I don't have a strong opinion except that it should be uniform across the different structs.

facusapienza21 · 2025-04-23T23:18:17Z

src/glaciers/data/SurfaceVelocityData.jl


 Constructor for ice surface velocity data based on Rabatel et. al (2023).


 Important remarks:
- Projections in longitude and latitude assume we are working in the north hemisphere.
-  If working with south hemisphere glaciers, this needs to be changed.
 - The error in velocity is unique per timestamp, rather than being pixel distributed.
 - The error in the absolute velocities `vabs_error` is overestimated.


I think right now we can just do north hemisphere unless we change this code:

# Spatial preprocessing proj_zone = ncgetatt(file, "mapping", "utm_zone_number") transform(X,Y) = Sleipnir.UTMercator(X, Y; zone=proj_zone, hemisphere=:north)

Right now this is done for north hemisphere. I think this was also something from the dataset: since the dataset for now is just for the north hemisphere, it is not as clear what will be the argument in the future from the dataset to determine if the datacube is north or south hemisphere. This is a good question for Romain/Antoine, but for now I will keep the comment until we are sure the preprocessing will work with South data.

src/glaciers/data/SurfaceVelocityData_utils.jl

test/runtests.jl

albangossard

Thank you for your quick and complete reviews @JordiBolibar @facusapienza21!

I'm answering the general comment here.

In this PR, @albangossard you implemented the first one, right? So now all the dataset is gridded in a raster with the same size than the target glacier. However, in principle the second approach is also valid, and I priori I was thinking in moving forward with doing that: just run the PDE on the grid and then make some interpolation to evaluate where there is data.

The main reason I implemented the first version and not the second is that it's more computationally intensive to grid the data on-the-fly. Right now it already takes a few seconds, and the interpolation is really simple. Although currently some of the time is spent reading on disk and that this is something that could be loaded in advance, I expect an on-the-fly gridding to significantly slow down the simulation. We would also have to truncate the observations to spare memory since the datacube is spatially larger than the size of the glacier (about only 1/4th of the spatial grid is useful for the Bossons glacier).

Furthermore, to evaluate a loss function we don't need the data to be gridded "in time" [...], so there is no reason to pack the data in discrete intervals of time: all this can be handled by the model itself when computing the loss function [...].

Note that with the current implementation, even if data are spatially gridded, there is no restriction about the time. The data are available at the same time steps as the ones of the datacube and we can i) take them all and store the dense solution, or ii) pick a few ones and compare them to the simulated speeds, or iii) average them as suggested by @JordiBolibar. All of this can be done afterwards by playing only with the time component.
I see these mappings in a very similar way as the gradient computation : we can split how the mapping is applied in time and in space.

We can definitely try this implementation first, but in principle both approaches are valid. Do you have any thoughts on this @albangossard or @JordiBolibar ?

From a mathematical perspective, I don't see any difference between the two as soon as the interpolation is properly implemented, which is obviously not the case right now since we're using the nearest points.

To end this remark, I agree that we will probably also implement the second option, at least to compare the two approaches. But in the long term my bet is that in practice we'll stick to the first version just because of the computational cost.
I'm happy to continue discussing about this! :)

albangossard · 2025-04-24T08:48:18Z

src/setup/config.jl

+    if isdir(prepro_dir)
+        daysSinceLastDownload = (Dates.now() - Dates.unix2datetime(mtime(prepro_dir)))/Day(1)
+        if daysSinceLastDownload > 1
+            # Re-download data if older than one day


It would attempt to download the whole thing. But a cleaner way to do that would be to compute a checksum locally (or just read a version number) and compare it with a checksum on the server. If the two doesn't match, then we download the data. We could also do that silently with a try/catch so that if the user doesn't have an internet connection, it skips the downloading and uses the local version.
I created an issue for this specific task: #104

albangossard · 2025-04-24T08:52:52Z

src/glaciers/data/SurfaceVelocityData_utils.jl

+    compute_vabs_error::Bool=true
+) where {G <: AbstractGlacier, VM <: VelocityMapping}
+
+    velRast = isa(raster, String) ? RasterStack(raster, lazy=true) : raster


From my experience it's slower when we read the data. But the most important point is that for one of the datacubes I've been playing with, it is just impossible to load it a non lazy way. In the netcdf file data are represented in a sparse way on disk, but once it gets loaded into memory, everything is dense and this represents more than 15GB of RAM!

albangossard · 2025-04-24T08:54:41Z

src/Sleipnir.jl

@@ -12,6 +12,7 @@ using JLD2
 using Distributed
 using Statistics
 using CairoMakie
+using Observables


It is used in lines like

Observables.connect!(cb.height, @lift CairoMakie.Fixed($(viewport(ax.scene)).widths[2]))

whose purpose is to make the colobar have the same height as the heatmap we are plotting.

albangossard · 2025-04-24T08:56:34Z

Project.toml

@@ -13,6 +13,7 @@ CoordRefSystems = "b46f11dc-f210-4604-bfba-323c1ec968cb"
 DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 DateFormats = "44557152-fe0a-4de1-8405-416d90313ce6"
 Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
+DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0"


Yes this is just to access Dim in Sleipnir/test/surface_velocity.jl for example. If that bothers you we can probably do Rasters.Dim instead.

albangossard · 2025-04-24T09:02:29Z

src/glaciers/data/SurfaceVelocityData.jl

+- `date1::Union{Vector{DateTime}, Nothing}`: First date of adquisition.
+- `date2::Union{Vector{DateTime}, Nothing}`: Second date of adquisition.
+- `date_error::Union{Vector{Day}, Vector{Millisecond}, Nothing}`: Error in `date`.
+- `glacierGridded::Bool`: Whether the data have been gridded to the glacier grid or not.


Yes we use the same struct to store the original raw data and the gridded ones. I don't think we need a specific struct to store the original raw data. Doing so will even come with many memory issues as here we load the data in a lazy mode only when we need them and many of the points in the original data aren't useful since they are outside of the glacier grid.
Most of the time, we probably won't use the raw data directly. However, it can be convenient in some cases for debugging or to visualize data of the netcdf files.

I changed the name to isGridGlacierAligned to make it clearer.

src/glaciers/data/SurfaceVelocityData_utils.jl

albangossard · 2025-04-24T12:11:25Z

src/glaciers/data/SurfaceVelocityMapping.jl

+abstract type VelocityMapping end
+
+"""
+    MeanDateVelocityMapping <: VelocityMapping


Well there are two different things that I mixed together: the spatial interpolation and the temporal interpolation. This is why the ice flow model is mentioned here, but we don't even care about it in Sleipnir. These are just structs to define what flavors of spatio-temporal mappings are implemented.

As discussed together today, I agree that these approaches are the two extremes in terms of complexity and we'll likely implement a simplification of the second one. We could either implement that in IntegratedTrajectoryMapping by using several hyper-parameters for allows reducing the number of time steps that are used, or implement it in another struct that is a subtype of VelocityMapping.

We'll probably implement these mappings in other PRs. All of this is exciting and I can't wait to start to play with these mappings!

albangossard · 2025-04-24T12:42:18Z

src/glaciers/data/SurfaceVelocityData_utils.jl

+    vy = [eltype(x).(replace(vy[:,:,i], missing => NaN)) for i in 1:size(vy, 3)]
+    vabs = [eltype(x).(replace(vabs[:,:,i], missing => NaN)) for i in 1:size(vabs, 3)]
+
+    if mapToGlacierGrid


Yes you're right, I merged the two if statements in a270f5f.

facusapienza21

All good! Feel free to merge and probably bump version to start propagarting changes to Huginn and ODINN.

JordiBolibar

Same, all good for me! :)

albangossard added the bug Something isn't working label Apr 9, 2025

albangossard mentioned this pull request Apr 9, 2025

Ice velocity integration ODINN-SciML/ODINN.jl#294

Merged

albangossard force-pushed the iceVelocityIntegration branch 3 times, most recently from f9a2fb4 to f9967d9 Compare April 15, 2025 15:19

albangossard added the enhancement New feature or request label Apr 23, 2025

albangossard added 14 commits April 23, 2025 15:58

fix error when keyword arguments are not provided in Results

ed4f0d8

update jld2 files

b3b0015

test Results instantiation in unit tests

5633df2

trying to fix CI error

05692d4

fix processVelocity type issue

e2c4f9e

put back plot tests

c278657

preprocessed data is redownloaded every day + add name field to Glaci…

43e1f55

…er2D + move to Rasters.jl to load surface velocity datacube + fix several bugs with projection

add feature to grid surface velocity to the glacier grid + generate f…

084f982

…ake datacube for testing purposes

fix tests + add docstrings

4613425

add thickness and velocity data to the glacier struct

1f5230f

fix unit test

1c326f6

improve plotting functions

c9b8221

finish updating the plotting functions

9c1c119

fix unit tests

3a50573

albangossard force-pushed the iceVelocityIntegration branch from 836245b to 3a50573 Compare April 23, 2025 14:06

albangossard marked this pull request as ready for review April 23, 2025 14:56

albangossard requested review from JordiBolibar and facusapienza21 April 23, 2025 14:56

JordiBolibar reviewed Apr 23, 2025

View reviewed changes

facusapienza21 reviewed Apr 23, 2025

View reviewed changes

changes following PR reviews

a270f5f

albangossard commented Apr 24, 2025

View reviewed changes

albangossard requested review from JordiBolibar and facusapienza21 April 24, 2025 19:31

facusapienza21 approved these changes Apr 24, 2025

View reviewed changes

bump version and update compat entries

c8a1ec4

JordiBolibar approved these changes Apr 25, 2025

View reviewed changes

albangossard merged commit de9c78f into main Apr 25, 2025
4 checks passed

albangossard deleted the iceVelocityIntegration branch April 25, 2025 10:36

Ice velocity integration #97

Ice velocity integration #97

Conversation

albangossard commented Apr 9, 2025 • edited Loading

codecov-commenter commented Apr 9, 2025 • edited Loading

Codecov Report

JordiBolibar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facusapienza21 commented Apr 23, 2025

facusapienza21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JordiBolibar Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albangossard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facusapienza21 left a comment

Choose a reason for hiding this comment

JordiBolibar left a comment

Choose a reason for hiding this comment

albangossard commented Apr 9, 2025 •

edited

Loading

codecov-commenter commented Apr 9, 2025 •

edited

Loading

JordiBolibar Apr 24, 2025 •

edited

Loading