|
| 1 | +# A high-resolution canopy height model of the Earth |
| 2 | + |
| 3 | +This repository contains the code used to create the results presented in the paper: [A high-resolution canopy height model of the Earth](https://arxiv.org/abs/2204.08322). |
| 4 | +Here, we developed a model to estimate canopy top height anywhere on Earth. The model estimates canopy top height for every Sentinel-2 image pixel and was trained using sparse GEDI LIDAR data as a reference. |
| 5 | + |
| 6 | +See our [project page](https://langnico.github.io/globalcanopyheight) for an interactive [demo](https://nlang.users.earthengine.app/view/global-canopy-height-2020) and more information. |
| 7 | + |
| 8 | +## Data availability |
| 9 | +This is a summary of all the published data: |
| 10 | + |
| 11 | +- Global canopy top height map for 2020 ([ETH Research Collection](https://doi.org/10.3929/ethz-b-000609802)) |
| 12 | +- Train-val dataset ([ETH Research Collection](https://doi.org/10.3929/ethz-b-000609845)) |
| 13 | +- Rasterized canopy top height models from airborne lidar ([Zenodo](https://doi.org/10.5281/zenodo.7885699)) |
| 14 | +- Trained model weights ([Github release]()) |
| 15 | +- Demo data for example scripts ([Zenodo](https://doi.org/10.5281/zenodo.7885610)) |
| 16 | +- Sparse GEDI canopy top height data ([Zenodo](https://doi.org/10.5281/zenodo.7737946)) |
| 17 | +- ESA WorldCover 10 m 2020 v100 reprojected to Sentinel-2 tiles ([Zenodo](https://doi.org/10.5281/zenodo.7888150)) |
| 18 | + |
| 19 | +## Installation and credentials |
| 20 | +Please follow the instructions in [INSTALL.md](INSTALL.md). |
| 21 | + |
| 22 | +## Loading the model |
| 23 | + |
| 24 | +```python |
| 25 | +from gchm.models.xception_sentinel2 import xceptionS2_08blocks_256 |
| 26 | +# load the model with random initialization |
| 27 | +model = xceptionS2_08blocks_256() |
| 28 | +``` |
| 29 | +Please see the [example notebook](gchm/notebooks/example_loading_pretrained_models.ipynb) on how to load the model with the trained weights. |
| 30 | + |
| 31 | +## Deploying |
| 32 | + |
| 33 | +This is a demo how to run the trained ensemble to compute the canopy height map for a Sentinel-2 tile (approx. 100 km x 100 km). |
| 34 | + |
| 35 | +### Preparation: |
| 36 | +1. Download the demo data which contains Sentinel-2 images for one tile: |
| 37 | + ``` |
| 38 | + bash gchm/bash/download_demo_data.sh ./ |
| 39 | + ``` |
| 40 | + This creates the following directory: |
| 41 | + ``` |
| 42 | + deploy_example/ |
| 43 | + ├── ESAworldcover |
| 44 | + │ └── 2020 |
| 45 | + │ └── sentinel2_tiles |
| 46 | + │ └── ESA_WorldCover_10m_2020_v100_32TMT.tif |
| 47 | + ├── image_paths |
| 48 | + │ └── 2020 |
| 49 | + │ └── 32TMT.txt |
| 50 | + ├── image_paths_logs |
| 51 | + │ └── 2020 |
| 52 | + ├── predictions_provided |
| 53 | + │ ├── 2020 |
| 54 | + │ │ ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851_predictions.tif |
| 55 | + │ │ ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851_std.tif |
| 56 | + │ │ ├── ... |
| 57 | + │ ├── 2020_merge |
| 58 | + │ │ └── preds_inv_var_mean |
| 59 | + │ │ ├── 32TMT_pred.tif |
| 60 | + │ │ └── 32TMT_std.tif |
| 61 | + │ └── 2020_merge_logs |
| 62 | + │ └── preds_inv_var_mean |
| 63 | + │ └── 32TMT.txt |
| 64 | + ├── sentinel2 |
| 65 | + │ └── 2020 |
| 66 | + │ ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip |
| 67 | + │ ├── ... |
| 68 | + └── sentinel2_aws |
| 69 | + └── 2020 |
| 70 | + ``` |
| 71 | +2. Download the trained model weights: |
| 72 | + ``` |
| 73 | + bash gchm/bash/download_trained_models.sh ./trained_models |
| 74 | + ``` |
| 75 | + |
| 76 | + This creates the following directory: |
| 77 | + |
| 78 | + ``` |
| 79 | + trained_models/ |
| 80 | + └── GLOBAL_GEDI_2019_2020 |
| 81 | + ├── model_0 |
| 82 | + │ ├── FT_Lm_SRCB |
| 83 | + │ │ ├── args.json |
| 84 | + │ │ ├── checkpoint.pt |
| 85 | + │ │ ├── train_input_mean.npy |
| 86 | + │ │ ├── train_input_std.npy |
| 87 | + │ │ ├── train_target_mean.npy |
| 88 | + │ │ └── train_target_std.npy |
| 89 | + │ ├── args.json |
| 90 | + │ ├── checkpoint.pt |
| 91 | + │ ├── train_input_mean.npy |
| 92 | + │ ├── train_input_std.npy |
| 93 | + │ ├── train_target_mean.npy |
| 94 | + │ └── train_target_std.npy |
| 95 | + ├── model_1 |
| 96 | + │ ├── ... |
| 97 | + ├── model_2 |
| 98 | + │ ├── ... |
| 99 | + ├── model_3 |
| 100 | + │ ├── ... |
| 101 | + ├── model_4 |
| 102 | + │ ├── ... |
| 103 | + ``` |
| 104 | + The checkpoint.pt files contain the model weights. The subdirectories `FT_Lm_SRCB` contain the models finetuned with a re-weighted loss function. |
| 105 | + |
| 106 | +### Deploy example for a single Sentinel-2 image |
| 107 | +This [demo script](gchm/bash/deploy_example.sh) processes a single image (from the year 2020) for the tile "32TMT" in Switzerland. Run: |
| 108 | +``` |
| 109 | +bash gchm/bash/deploy_example.sh |
| 110 | +``` |
| 111 | +
|
| 112 | +### Deploy and merge example for multiple images of a Sentinel-2 tile |
| 113 | +This [demo script](gchm/bash/run_tile_deploy_merge.sh) processes 10 images (from the year 2020) for the tile "32TMT" in Switzerland and aggregates the individual per-image maps to a final annual map. |
| 114 | +
|
| 115 | +Provide a text file with the image filenames per tile saved as `${TILE_NAME}.txt`. The demo data contains the following file: |
| 116 | +``` |
| 117 | +cat ./deploy_example/image_paths/2020/32TMT.txt |
| 118 | +S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip |
| 119 | +S2A_MSIL2A_20200723T103031_N0214_R108_T32TMT_20200723T142801.zip |
| 120 | +S2A_MSIL2A_20200812T103031_N0214_R108_T32TMT_20200812T131334.zip |
| 121 | +... |
| 122 | +``` |
| 123 | +The corresponding images are stored in `./deploy_example/sentinel2/2020/`. |
| 124 | +
|
| 125 | +
|
| 126 | +1. Set the paths in `gchm/bash/config.sh` |
| 127 | +2. Set the tile_name in `gchm/bash/run_tile_deploy_merge.sh` |
| 128 | +3. Run the script: |
| 129 | + ``` |
| 130 | + bash gchm/bash/run_tile_deploy_merge.sh |
| 131 | + ``` |
| 132 | +
|
| 133 | +#### Note on ESA World Cover post-processing: |
| 134 | +The ESA WorldCover 10 m 2020 v100 reprojected to Sentinel-2 tiles is available on [Zenodo](https://doi.org/10.5281/zenodo.7888150). |
| 135 | +We apply minimal post-processing and mask out built-up areas, snow, |
| 136 | + ice and permanent water bodies, setting their canopy height to ”no data” (value: 255). See the script [here](gchm/postprocess/mask_with_ESAworldcover.py). |
| 137 | +
|
| 138 | +#### Note on AWS: |
| 139 | +Sentinel-2 images can be downloaded on the fly from AWS S3 by setting `GCHM_DOWNLOAD_FROM_AWS="True"` |
| 140 | +and providing the AWS credentials as described above. |
| 141 | +This was tested for 2020 data, but might need some update in the sentinelhub routine to handle newer versions. |
| 142 | +
|
| 143 | +
|
| 144 | +## Training |
| 145 | +
|
| 146 | +### Data preparation |
| 147 | +1. Download the train-val h5 datasets from [here](https://doi.org/10.3929/ethz-b-000609845). |
| 148 | +2. Merge the parts file to a single `train.h5` and `val.h5` by running this [script](gchm/preprocess/run_merge_h5_files_per_split.sh). |
| 149 | + Before running it, set the variables `in_h5_dir_parts` and `out_h5_dir` to your paths. Then run: |
| 150 | + ``` |
| 151 | + bash gchm/preprocess/run_merge_h5_files_per_split.sh` |
| 152 | + ``` |
| 153 | +
|
| 154 | +### Running the training script |
| 155 | +A [slurm training script](gchm/bash/run_training.sh) is provided and submitted as follows. |
| 156 | +Before submitting, set the variable `CODE_PATH` at the top of the script and set the paths in `gchm/bash/config.sh`. Then run: |
| 157 | +``` |
| 158 | +sbatch < gchm/bash/run_training.sh |
| 159 | +``` |
| 160 | +
|
| 161 | +## ALS preprocessing for independent comparison |
| 162 | +
|
| 163 | +In cases where rastered high-resolution canopy height models are available (e.g. from airborne LIDAR campaigns) for independent evaluation, some preprocessing steps are required to |
| 164 | +make the data comparable to GEDI canopy top height estimates corresponding to the canopy top within a 25 meter footprint. |
| 165 | +
|
| 166 | +1. A rastered canopy height model with a 1m GSD should be created (E.g. using `gdalwarp`). |
| 167 | +2. The 1m canopy height model can then be processed with a circular max pooling operation to approximate "GEDI-like" canopy top heights. This step is provided as a [pytorch implementation](gchm/preprocess/ALS_maxpool_GEDI_footprint.py). |
| 168 | +
|
| 169 | +**Example**: |
| 170 | +Download the example CHM at 1m GSD from [here](https://zenodo.org/record/7885610/files/ALS_example_CTHM_GSD1m.tif). Then run: |
| 171 | +``` |
| 172 | +python3 gchm/preprocess/ALS_maxpool_GEDI_footprint.py "path/to/input/tif" "path/to/output/tif" |
| 173 | +``` |
| 174 | +
|
| 175 | +## Citation |
| 176 | +
|
| 177 | +Please cite our paper if you use this code or any of the provided data. |
| 178 | +
|
| 179 | +Lang, N., Jetz, W., Schindler, K., & Wegner, J. D. (2022). A high-resolution canopy height model of the Earth. arXiv preprint arXiv:2204.08322. |
| 180 | +``` |
| 181 | +@article{lang2022high, |
| 182 | + title={A high-resolution canopy height model of the Earth}, |
| 183 | + author={Lang, Nico and Jetz, Walter and Schindler, Konrad and Wegner, Jan Dirk}, |
| 184 | + journal={arXiv preprint arXiv:2204.08322}, |
| 185 | + year={2022} |
| 186 | +} |
| 187 | +``` |
| 188 | +
|
0 commit comments