Skip to content

Commit 755cbf3

Browse files
author
nlang
committed
first commit
0 parents  commit 755cbf3

39 files changed

+4791
-0
lines changed

.gitignore

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
.idea
2+
deploy_dir/
3+
deploy_example/
4+
trained_models/
5+
__pycache__
6+
gchm.egg-info
7+
wandb/
8+
venv/
9+
.ipynb_checkpoints
10+
.DS_Store
11+
*.zip
12+
*.tif
13+
*.npy
14+
*.pt
15+
*.json
16+

INSTALL.md

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
## Installation and credentials
2+
3+
Here we present two ways how to install the packages.
4+
- A) requires GDAL to be installed on the system first.
5+
- B) GDAL is installed with mamba/conda in the environment.
6+
7+
### A) With pip in a virtual environment
8+
9+
1. Install [GDAL](https://gdal.org/). For Ubuntu follow e.g. these [instructions](https://mothergeo-py.readthedocs.io/en/latest/development/how-to/gdal-ubuntu-pkg.html).
10+
2. Create a new [virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) called `gchm` by running: `python -m venv $HOME/venvs/gchm`
11+
3. Activate the environment:`source $HOME/venvs/gchm/bin/activate`. (Check that python points to the new environment with `which python3`.)
12+
4. Install pytorch by following the instructions on [pytorch.org](https://pytorch.org/) that match your versions. Run e.g. `python3 -m pip install torch torchvision torchaudio`
13+
5. Install the GDAL python API matching the installed GDAL version: `python3 -m pip install GDAL==3.5.3`
14+
6. Install all other required packages: `python3 -m pip install -r requirements.txt`
15+
7. Install this project as an editable package called `gchm`. Make sure you are in the directory of the repository containing the file `setup.py` .
16+
Run: `python3 -m pip install -e .` (Note the dot `.` at the end.)
17+
18+
19+
### B) Mamba/conda installation
20+
21+
1. Install mambaforge: https://github.com/conda-forge/miniforge#mambaforge
22+
2. Create a new environment called `gchm` with pytorch (or follow the instructions on [pytorch.org](https://pytorch.org/)):
23+
`mamba create -n gchm python=3.10.9 pytorch torchvision torchaudio cudatoolkit=11.8 -c pytorch -c nvidia`
24+
3. Activate the environment: `mamba activate gchm`. (Check that python points to the new environment: E.g.
25+
`which python` should print something like: `~/mambaforge/envs/gchm/bin/python`)
26+
4. Install gdal: `mamba install -c conda-forge gdal=3.6.2`
27+
5. Install pytables: `mamba install -c anaconda pytables=3.7.0`
28+
6. Install all other required packages from conda-forge using the `environment.yml` file. Change directory to the repository and run:
29+
`mamba env update -f environment.yml`
30+
7. Install this project as an editable package called `gchm`. Make sure you are in the directory of the repository containing the file `setup.py` .
31+
Run: `pip install -e .` (Note the dot `.` at the end.)
32+
33+
### Credentials for wandb
34+
Optional. Only needed to run the training code (Not needed for deployment).
35+
Create a file called `~/.config_wandb` containing your [weights and biases API key](https://docs.wandb.ai/quickstart):
36+
```
37+
export WANDB_API_KEY=YOUR_API_KEY
38+
```
39+
40+
41+
### Credentials for AWS
42+
Optional. This is only needed to download Sentinel-2 images from AWS on the fly using `gchm/deploy.py`.
43+
***Note that there are costs per GB downloaded!***
44+
45+
Create a file `~/.aws_configs` containing your AWS credentials as environment variables.
46+
```
47+
export AWS_ACCESS_KEY_ID=PUT_YOUR_KEY_ID_HERE
48+
export AWS_SECRET_ACCESS_KEY=PUT_YOUR_SECRET_ACCESS_KEY_HERE
49+
export AWS_REQUEST_PAYER=requester
50+
```
51+
To create an AWS account go to: https://aws.amazon.com/console/

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 Nico Lang
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+188
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# A high-resolution canopy height model of the Earth
2+
3+
This repository contains the code used to create the results presented in the paper: [A high-resolution canopy height model of the Earth](https://arxiv.org/abs/2204.08322).
4+
Here, we developed a model to estimate canopy top height anywhere on Earth. The model estimates canopy top height for every Sentinel-2 image pixel and was trained using sparse GEDI LIDAR data as a reference.
5+
6+
See our [project page](https://langnico.github.io/globalcanopyheight) for an interactive [demo](https://nlang.users.earthengine.app/view/global-canopy-height-2020) and more information.
7+
8+
## Data availability
9+
This is a summary of all the published data:
10+
11+
- Global canopy top height map for 2020 ([ETH Research Collection](https://doi.org/10.3929/ethz-b-000609802))
12+
- Train-val dataset ([ETH Research Collection](https://doi.org/10.3929/ethz-b-000609845))
13+
- Rasterized canopy top height models from airborne lidar ([Zenodo](https://doi.org/10.5281/zenodo.7885699))
14+
- Trained model weights ([Github release]())
15+
- Demo data for example scripts ([Zenodo](https://doi.org/10.5281/zenodo.7885610))
16+
- Sparse GEDI canopy top height data ([Zenodo](https://doi.org/10.5281/zenodo.7737946))
17+
- ESA WorldCover 10 m 2020 v100 reprojected to Sentinel-2 tiles ([Zenodo](https://doi.org/10.5281/zenodo.7888150))
18+
19+
## Installation and credentials
20+
Please follow the instructions in [INSTALL.md](INSTALL.md).
21+
22+
## Loading the model
23+
24+
```python
25+
from gchm.models.xception_sentinel2 import xceptionS2_08blocks_256
26+
# load the model with random initialization
27+
model = xceptionS2_08blocks_256()
28+
```
29+
Please see the [example notebook](gchm/notebooks/example_loading_pretrained_models.ipynb) on how to load the model with the trained weights.
30+
31+
## Deploying
32+
33+
This is a demo how to run the trained ensemble to compute the canopy height map for a Sentinel-2 tile (approx. 100 km x 100 km).
34+
35+
### Preparation:
36+
1. Download the demo data which contains Sentinel-2 images for one tile:
37+
```
38+
bash gchm/bash/download_demo_data.sh ./
39+
```
40+
This creates the following directory:
41+
```
42+
deploy_example/
43+
├── ESAworldcover
44+
│   └── 2020
45+
│   └── sentinel2_tiles
46+
│   └── ESA_WorldCover_10m_2020_v100_32TMT.tif
47+
├── image_paths
48+
│   └── 2020
49+
│   └── 32TMT.txt
50+
├── image_paths_logs
51+
│   └── 2020
52+
├── predictions_provided
53+
│   ├── 2020
54+
│   │   ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851_predictions.tif
55+
│   │   ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851_std.tif
56+
│   │   ├── ...
57+
│   ├── 2020_merge
58+
│   │   └── preds_inv_var_mean
59+
│   │   ├── 32TMT_pred.tif
60+
│   │   └── 32TMT_std.tif
61+
│   └── 2020_merge_logs
62+
│   └── preds_inv_var_mean
63+
│   └── 32TMT.txt
64+
├── sentinel2
65+
│   └── 2020
66+
│   ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip
67+
│   ├── ...
68+
└── sentinel2_aws
69+
└── 2020
70+
```
71+
2. Download the trained model weights:
72+
```
73+
bash gchm/bash/download_trained_models.sh ./trained_models
74+
```
75+
76+
This creates the following directory:
77+
78+
```
79+
trained_models/
80+
└── GLOBAL_GEDI_2019_2020
81+
├── model_0
82+
│   ├── FT_Lm_SRCB
83+
│   │   ├── args.json
84+
│   │   ├── checkpoint.pt
85+
│   │   ├── train_input_mean.npy
86+
│   │   ├── train_input_std.npy
87+
│   │   ├── train_target_mean.npy
88+
│   │   └── train_target_std.npy
89+
│   ├── args.json
90+
│   ├── checkpoint.pt
91+
│   ├── train_input_mean.npy
92+
│   ├── train_input_std.npy
93+
│   ├── train_target_mean.npy
94+
│   └── train_target_std.npy
95+
├── model_1
96+
│   ├── ...
97+
├── model_2
98+
│   ├── ...
99+
├── model_3
100+
│   ├── ...
101+
├── model_4
102+
│   ├── ...
103+
```
104+
The checkpoint.pt files contain the model weights. The subdirectories `FT_Lm_SRCB` contain the models finetuned with a re-weighted loss function.
105+
106+
### Deploy example for a single Sentinel-2 image
107+
This [demo script](gchm/bash/deploy_example.sh) processes a single image (from the year 2020) for the tile "32TMT" in Switzerland. Run:
108+
```
109+
bash gchm/bash/deploy_example.sh
110+
```
111+
112+
### Deploy and merge example for multiple images of a Sentinel-2 tile
113+
This [demo script](gchm/bash/run_tile_deploy_merge.sh) processes 10 images (from the year 2020) for the tile "32TMT" in Switzerland and aggregates the individual per-image maps to a final annual map.
114+
115+
Provide a text file with the image filenames per tile saved as `${TILE_NAME}.txt`. The demo data contains the following file:
116+
```
117+
cat ./deploy_example/image_paths/2020/32TMT.txt
118+
S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip
119+
S2A_MSIL2A_20200723T103031_N0214_R108_T32TMT_20200723T142801.zip
120+
S2A_MSIL2A_20200812T103031_N0214_R108_T32TMT_20200812T131334.zip
121+
...
122+
```
123+
The corresponding images are stored in `./deploy_example/sentinel2/2020/`.
124+
125+
126+
1. Set the paths in `gchm/bash/config.sh`
127+
2. Set the tile_name in `gchm/bash/run_tile_deploy_merge.sh`
128+
3. Run the script:
129+
```
130+
bash gchm/bash/run_tile_deploy_merge.sh
131+
```
132+
133+
#### Note on ESA World Cover post-processing:
134+
The ESA WorldCover 10 m 2020 v100 reprojected to Sentinel-2 tiles is available on [Zenodo](https://doi.org/10.5281/zenodo.7888150).
135+
We apply minimal post-processing and mask out built-up areas, snow,
136+
ice and permanent water bodies, setting their canopy height to ”no data” (value: 255). See the script [here](gchm/postprocess/mask_with_ESAworldcover.py).
137+
138+
#### Note on AWS:
139+
Sentinel-2 images can be downloaded on the fly from AWS S3 by setting `GCHM_DOWNLOAD_FROM_AWS="True"`
140+
and providing the AWS credentials as described above.
141+
This was tested for 2020 data, but might need some update in the sentinelhub routine to handle newer versions.
142+
143+
144+
## Training
145+
146+
### Data preparation
147+
1. Download the train-val h5 datasets from [here](https://doi.org/10.3929/ethz-b-000609845).
148+
2. Merge the parts file to a single `train.h5` and `val.h5` by running this [script](gchm/preprocess/run_merge_h5_files_per_split.sh).
149+
Before running it, set the variables `in_h5_dir_parts` and `out_h5_dir` to your paths. Then run:
150+
```
151+
bash gchm/preprocess/run_merge_h5_files_per_split.sh`
152+
```
153+
154+
### Running the training script
155+
A [slurm training script](gchm/bash/run_training.sh) is provided and submitted as follows.
156+
Before submitting, set the variable `CODE_PATH` at the top of the script and set the paths in `gchm/bash/config.sh`. Then run:
157+
```
158+
sbatch < gchm/bash/run_training.sh
159+
```
160+
161+
## ALS preprocessing for independent comparison
162+
163+
In cases where rastered high-resolution canopy height models are available (e.g. from airborne LIDAR campaigns) for independent evaluation, some preprocessing steps are required to
164+
make the data comparable to GEDI canopy top height estimates corresponding to the canopy top within a 25 meter footprint.
165+
166+
1. A rastered canopy height model with a 1m GSD should be created (E.g. using `gdalwarp`).
167+
2. The 1m canopy height model can then be processed with a circular max pooling operation to approximate "GEDI-like" canopy top heights. This step is provided as a [pytorch implementation](gchm/preprocess/ALS_maxpool_GEDI_footprint.py).
168+
169+
**Example**:
170+
Download the example CHM at 1m GSD from [here](https://zenodo.org/record/7885610/files/ALS_example_CTHM_GSD1m.tif). Then run:
171+
```
172+
python3 gchm/preprocess/ALS_maxpool_GEDI_footprint.py "path/to/input/tif" "path/to/output/tif"
173+
```
174+
175+
## Citation
176+
177+
Please cite our paper if you use this code or any of the provided data.
178+
179+
Lang, N., Jetz, W., Schindler, K., & Wegner, J. D. (2022). A high-resolution canopy height model of the Earth. arXiv preprint arXiv:2204.08322.
180+
```
181+
@article{lang2022high,
182+
title={A high-resolution canopy height model of the Earth},
183+
author={Lang, Nico and Jetz, Walter and Schindler, Konrad and Wegner, Jan Dirk},
184+
journal={arXiv preprint arXiv:2204.08322},
185+
year={2022}
186+
}
187+
```
188+

environment.yml

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
name: gchm
2+
3+
channels:
4+
- conda-forge
5+
- defaults
6+
7+
dependencies:
8+
- ipython
9+
- matplotlib
10+
- numpy
11+
- wandb
12+
- pathlib
13+
- tqdm
14+
- botocore
15+
- urllib3
16+
- wandb
17+
- tensorboard
18+
- sentinelhub=3.9.0
19+
- anaconda::scikit-image
20+
- anaconda::scikit-learn
21+
- anaconda::typing
22+
- anaconda::jupyter

gchm/__init__.py

Whitespace-only changes.

gchm/bash/config.sh

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Configuration file for GCHM
2+
3+
# ----------------------------
4+
# ---------- DEPLOY ----------
5+
# ----------------------------
6+
7+
export GCHM_DEPLOY_PARENT_DIR="./deploy_example"
8+
export YEAR="2020"
9+
export GCHM_DEPLOY_IMAGE_PATHS_DIR="${GCHM_DEPLOY_PARENT_DIR}/image_paths/${YEAR}"
10+
export GCHM_DEPLOY_IMAGE_PATHS_LOG_DIR="${GCHM_DEPLOY_PARENT_DIR}/image_paths_logs/${YEAR}"
11+
export GCHM_DEPLOY_SENTINEL2_DIR="${GCHM_DEPLOY_PARENT_DIR}/sentinel2/${YEAR}"
12+
export GCHM_DEPLOY_DIR="${GCHM_DEPLOY_PARENT_DIR}/predictions/${YEAR}"
13+
export GCHM_MODEL_DIR="./trained_models/GLOBAL_GEDI_2019_2020"
14+
export GCHM_NUM_MODELS=5
15+
16+
export GCHM_DOWNLOAD_FROM_AWS="False" # Set this to "True" or "False"
17+
export GCHM_AWS_CONFIGS_FILE="$HOME/.aws_configs"
18+
export GCHM_DEPLOY_SENTINEL2_AWS_DIR="${GCHM_DEPLOY_PARENT_DIR}/sentinel2_aws/${YEAR}"
19+
20+
# make directories
21+
mkdir -p ${GCHM_DEPLOY_DIR}
22+
mkdir -p ${GCHM_DEPLOY_SENTINEL2_AWS_DIR}
23+
mkdir -p ${GCHM_DEPLOY_IMAGE_PATHS_LOG_DIR}
24+
25+
# ----------------------------
26+
# --------- TRAINING ---------
27+
# ----------------------------
28+
29+
export GCHM_TRAINING_DATA_DIR="/cluster/work/igp_psr/nlang/global_vhm/gchm_public_data/training_data/GLOBAL_GEDI_2019_2020/all_shuffled"
30+
export GCHM_TRAINING_EXPERIMENT_DIR="/cluster/work/igp_psr/nlang/experiments/gchm"
31+
# Set path to python
32+
export PYTHON="$HOME/venvs/gchm/bin/python"
33+
34+

gchm/bash/deploy_example.sh

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#!/bin/bash
2+
3+
DEPLOY_IMAGE_PATH="./deploy_example/sentinel2/2020/S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip"
4+
GCHM_DEPLOY_DIR="./deploy_example/predictions/2020"
5+
6+
GCHM_MODEL_DIR="./trained_models/GLOBAL_GEDI_2019_2020"
7+
GCHM_NUM_MODELS="5"
8+
9+
filepath_failed_image_paths="./deploy_example/log_failed.txt"
10+
11+
GCHM_DOWNLOAD_FROM_AWS="False"
12+
GCHM_DEPLOY_SENTINEL2_AWS_DIR="./deploy_example/sentinel2_aws"
13+
14+
# create directories
15+
mkdir -p ${GCHM_DEPLOY_DIR}
16+
mkdir -p ${GCHM_DEPLOY_SENTINEL2_AWS_DIR}
17+
18+
python3 gchm/deploy.py --model_dir=${GCHM_MODEL_DIR} \
19+
--deploy_image_path=${DEPLOY_IMAGE_PATH} \
20+
--deploy_dir=${GCHM_DEPLOY_DIR} \
21+
--deploy_patch_size=512 \
22+
--num_workers_deploy=4 \
23+
--num_models=${GCHM_NUM_MODELS} \
24+
--finetune_strategy="FT_Lm_SRCB" \
25+
--filepath_failed_image_paths=${filepath_failed_image_paths} \
26+
--download_from_aws=${GCHM_DOWNLOAD_FROM_AWS} \
27+
--sentinel2_dir=${sentinel2_dir} \
28+
--remove_image_after_pred="False"
29+

gchm/bash/download_demo_data.sh

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
3+
# parse and create directory from first argument
4+
trained_models_dir=${1}
5+
mkdir -p ${trained_models_dir}
6+
cd ${trained_models_dir}
7+
8+
url="https://zenodo.org/record/7885610/files/gchm_deploy_example.zip?download=1"
9+
# download zip file
10+
curl $url --output "gchm_deploy_example.zip"
11+
# unzip
12+
unzip gchm_deploy_example.zip
13+
# delete zip file
14+
rm gchm_deploy_example.zip
15+
16+
echo "DONE. Trained models extracted in:"
17+
pwd
18+

0 commit comments

Comments
 (0)