langnico
diff --git a/‎.gitignore
+16 b/‎.gitignore
+16
diff --git a/‎INSTALL.md
+51 b/‎INSTALL.md
+51
diff --git a/‎LICENSE
+21 b/‎LICENSE
+21
diff --git a/‎README.md
+188 b/‎README.md
+188
diff --git a/‎environment.yml
+22 b/‎environment.yml
+22
diff --git a/‎gchm/__init__.py b/‎gchm/__init__.py
diff --git a/‎gchm/bash/config.sh
+34 b/‎gchm/bash/config.sh
+34
diff --git a/‎gchm/bash/deploy_example.sh
+29 b/‎gchm/bash/deploy_example.sh
+29
diff --git a/‎gchm/bash/download_demo_data.sh
+18 b/‎gchm/bash/download_demo_data.sh
+18
@@ -0,0 +1,16 @@
+.idea
+deploy_dir/
+deploy_example/
+trained_models/
+__pycache__
+gchm.egg-info
+wandb/
+venv/
+.ipynb_checkpoints
+.DS_Store
+*.zip
+*.tif
+*.npy
+*.pt
+*.json
+
@@ -0,0 +1,51 @@
+## Installation and credentials
+
+Here we present two ways how to install the packages. 
+- A) requires GDAL to be installed on the system first. 
+- B) GDAL is installed with mamba/conda in the environment.
+
+### A) With pip in a virtual environment
+
+1. Install [GDAL](https://gdal.org/). For Ubuntu follow e.g. these [instructions](https://mothergeo-py.readthedocs.io/en/latest/development/how-to/gdal-ubuntu-pkg.html).
+2. Create a new [virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) called `gchm` by running: `python -m venv $HOME/venvs/gchm`
+3. Activate the environment:`source $HOME/venvs/gchm/bin/activate`. (Check that python points to the new environment with `which python3`.)
+4. Install pytorch by following the instructions on [pytorch.org](https://pytorch.org/) that match your versions. Run e.g. `python3 -m pip install torch torchvision torchaudio`
+5. Install the GDAL python API matching the installed GDAL version: `python3 -m pip install GDAL==3.5.3`
+6. Install all other required packages: `python3 -m pip install -r requirements.txt`
+7. Install this project as an editable package called `gchm`. Make sure you are in the directory of the repository containing the file `setup.py` . 
+  Run: `python3 -m pip install -e .` (Note the dot `.` at the end.)
+
+   
+### B) Mamba/conda installation
+
+1. Install mambaforge: https://github.com/conda-forge/miniforge#mambaforge
+2. Create a new environment called `gchm` with pytorch (or follow the instructions on [pytorch.org](https://pytorch.org/)):
+`mamba create -n gchm python=3.10.9 pytorch torchvision torchaudio cudatoolkit=11.8 -c pytorch -c nvidia`
+3. Activate the environment: `mamba activate gchm`. (Check that python points to the new environment: E.g. 
+  `which python` should print something like: `~/mambaforge/envs/gchm/bin/python`)
+4. Install gdal: `mamba install -c conda-forge gdal=3.6.2`
+5. Install pytables: `mamba install -c anaconda pytables=3.7.0`
+6. Install all other required packages from conda-forge using the `environment.yml` file. Change directory to the repository and run: 
+`mamba env update -f environment.yml`
+7. Install this project as an editable package called `gchm`. Make sure you are in the directory of the repository containing the file `setup.py` . 
+  Run: `pip install -e .` (Note the dot `.` at the end.)
+
+### Credentials for wandb
+Optional. Only needed to run the training code (Not needed for deployment).
+Create a file called `~/.config_wandb` containing your [weights and biases API key](https://docs.wandb.ai/quickstart): 
+```
+export WANDB_API_KEY=YOUR_API_KEY
+```
+
+
+### Credentials for AWS
+Optional. This is only needed to download Sentinel-2 images from AWS on the fly using `gchm/deploy.py`. 
+***Note that there are costs per GB downloaded!***
+
+Create a file `~/.aws_configs` containing your AWS credentials as environment variables. 
+```
+export AWS_ACCESS_KEY_ID=PUT_YOUR_KEY_ID_HERE
+export AWS_SECRET_ACCESS_KEY=PUT_YOUR_SECRET_ACCESS_KEY_HERE
+export AWS_REQUEST_PAYER=requester
+```
+To create an AWS account go to: https://aws.amazon.com/console/
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Nico Lang
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,188 @@
+# A high-resolution canopy height model of the Earth
+
+This repository contains the code used to create the results presented in the paper: [A high-resolution canopy height model of the Earth](https://arxiv.org/abs/2204.08322).
+Here, we developed a model to estimate canopy top height anywhere on Earth. The model estimates canopy top height for every Sentinel-2 image pixel and was trained using sparse GEDI LIDAR data as a reference.
+
+See our [project page](https://langnico.github.io/globalcanopyheight) for an interactive [demo](https://nlang.users.earthengine.app/view/global-canopy-height-2020) and more information.
+
+## Data availability
+This is a summary of all the published data:
+
+- Global canopy top height map for 2020 ([ETH Research Collection](https://doi.org/10.3929/ethz-b-000609802))
+- Train-val dataset ([ETH Research Collection](https://doi.org/10.3929/ethz-b-000609845))
+- Rasterized canopy top height models from airborne lidar ([Zenodo](https://doi.org/10.5281/zenodo.7885699))
+- Trained model weights ([Github release]())
+- Demo data for example scripts ([Zenodo](https://doi.org/10.5281/zenodo.7885610))
+- Sparse GEDI canopy top height data ([Zenodo](https://doi.org/10.5281/zenodo.7737946))
+- ESA WorldCover 10 m 2020 v100 reprojected to Sentinel-2 tiles ([Zenodo](https://doi.org/10.5281/zenodo.7888150))
+
+## Installation and credentials
+Please follow the instructions in [INSTALL.md](INSTALL.md).
+
+## Loading the model 
+
+```python
+from gchm.models.xception_sentinel2 import xceptionS2_08blocks_256
+# load the model with random initialization
+model = xceptionS2_08blocks_256()
+```
+Please see the [example notebook](gchm/notebooks/example_loading_pretrained_models.ipynb) on how to load the model with the trained weights. 
+
+## Deploying
+
+This is a demo how to run the trained ensemble to compute the canopy height map for a Sentinel-2 tile (approx. 100 km x 100 km).
+
+### Preparation:
+1. Download the demo data which contains Sentinel-2 images for one tile: 
+    ```
+    bash gchm/bash/download_demo_data.sh ./
+    ```
+   This creates the following directory:
+    ```
+    deploy_example/
+    ├── ESAworldcover
+    │   └── 2020
+    │       └── sentinel2_tiles
+    │           └── ESA_WorldCover_10m_2020_v100_32TMT.tif
+    ├── image_paths
+    │   └── 2020
+    │       └── 32TMT.txt
+    ├── image_paths_logs
+    │   └── 2020
+    ├── predictions_provided
+    │   ├── 2020
+    │   │   ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851_predictions.tif
+    │   │   ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851_std.tif
+    │   │   ├── ...
+    │   ├── 2020_merge
+    │   │   └── preds_inv_var_mean
+    │   │       ├── 32TMT_pred.tif
+    │   │       └── 32TMT_std.tif
+    │   └── 2020_merge_logs
+    │       └── preds_inv_var_mean
+    │           └── 32TMT.txt
+    ├── sentinel2
+    │   └── 2020
+    │       ├── S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip
+    │       ├── ...
+    └── sentinel2_aws
+        └── 2020
+    ```
+2. Download the trained model weights:
+    ```
+    bash gchm/bash/download_trained_models.sh ./trained_models
+    ```
+   
+    This creates the following directory:
+    
+    ```
+    trained_models/
+    └── GLOBAL_GEDI_2019_2020
+        ├── model_0
+        │   ├── FT_Lm_SRCB
+        │   │   ├── args.json
+        │   │   ├── checkpoint.pt
+        │   │   ├── train_input_mean.npy
+        │   │   ├── train_input_std.npy
+        │   │   ├── train_target_mean.npy
+        │   │   └── train_target_std.npy
+        │   ├── args.json
+        │   ├── checkpoint.pt
+        │   ├── train_input_mean.npy
+        │   ├── train_input_std.npy
+        │   ├── train_target_mean.npy
+        │   └── train_target_std.npy
+        ├── model_1
+        │   ├── ...
+        ├── model_2
+        │   ├── ...
+        ├── model_3
+        │   ├── ...
+        ├── model_4
+        │   ├── ...
+    ```
+   The checkpoint.pt files contain the model weights. The subdirectories `FT_Lm_SRCB` contain the models finetuned with a re-weighted loss function.
+            
+### Deploy example for a single Sentinel-2 image
+This [demo script](gchm/bash/deploy_example.sh) processes a single image (from the year 2020) for the tile "32TMT" in Switzerland. Run: 
+```
+bash gchm/bash/deploy_example.sh
+```
+
+### Deploy and merge example for multiple images of a Sentinel-2 tile
+This [demo script](gchm/bash/run_tile_deploy_merge.sh) processes 10 images (from the year 2020) for the tile "32TMT" in Switzerland and aggregates the individual per-image maps to a final annual map.
+
+Provide a text file with the image filenames per tile saved as `${TILE_NAME}.txt`. The demo data contains the following file: 
+```
+cat ./deploy_example/image_paths/2020/32TMT.txt 
+S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip
+S2A_MSIL2A_20200723T103031_N0214_R108_T32TMT_20200723T142801.zip
+S2A_MSIL2A_20200812T103031_N0214_R108_T32TMT_20200812T131334.zip
+...
+```
+The corresponding images are stored in `./deploy_example/sentinel2/2020/`.
+
+
+1. Set the paths in `gchm/bash/config.sh`
+2. Set the tile_name in `gchm/bash/run_tile_deploy_merge.sh`
+3. Run the script:
+    ```
+    bash gchm/bash/run_tile_deploy_merge.sh
+    ```
+
+#### Note on ESA World Cover post-processing: 
+The ESA WorldCover 10 m 2020 v100 reprojected to Sentinel-2 tiles is available on [Zenodo](https://doi.org/10.5281/zenodo.7888150). 
+We apply minimal post-processing and mask out built-up areas, snow,
+ ice and permanent water bodies, setting their canopy height to ”no data” (value: 255). See the script [here](gchm/postprocess/mask_with_ESAworldcover.py).
+
+#### Note on AWS: 
+Sentinel-2 images can be downloaded on the fly from AWS S3 by setting `GCHM_DOWNLOAD_FROM_AWS="True"` 
+and providing the AWS credentials as described above. 
+This was tested for 2020 data, but might need some update in the sentinelhub routine to handle newer versions.
+
+
+## Training
+
+### Data preparation
+1. Download the train-val h5 datasets from [here](https://doi.org/10.3929/ethz-b-000609845).
+2. Merge the parts file to a single `train.h5` and `val.h5` by running this [script](gchm/preprocess/run_merge_h5_files_per_split.sh). 
+   Before running it, set the variables `in_h5_dir_parts` and `out_h5_dir` to your paths. Then run:
+    ```
+    bash gchm/preprocess/run_merge_h5_files_per_split.sh`
+    ```
+
+### Running the training script
+A [slurm training script](gchm/bash/run_training.sh) is provided and submitted as follows.
+Before submitting, set the variable `CODE_PATH` at the top of the script and set the paths in `gchm/bash/config.sh`. Then run:
+```
+sbatch < gchm/bash/run_training.sh
+```
+
+## ALS preprocessing for independent comparison
+
+In cases where rastered high-resolution canopy height models are available (e.g. from airborne LIDAR campaigns) for independent evaluation, some preprocessing steps are required to 
+make the data comparable to GEDI canopy top height estimates corresponding to the canopy top within a 25 meter footprint.
+
+1. A rastered canopy height model with a 1m GSD should be created (E.g. using `gdalwarp`).
+2. The 1m canopy height model can then be processed with a circular max pooling operation to approximate "GEDI-like" canopy top heights. This step is provided as a [pytorch implementation](gchm/preprocess/ALS_maxpool_GEDI_footprint.py).
+
+**Example**:
+Download the example CHM at 1m GSD from [here](https://zenodo.org/record/7885610/files/ALS_example_CTHM_GSD1m.tif). Then run: 
+```
+python3 gchm/preprocess/ALS_maxpool_GEDI_footprint.py "path/to/input/tif" "path/to/output/tif"
+```
+
+## Citation
+
+Please cite our paper if you use this code or any of the provided data.
+
+Lang, N., Jetz, W., Schindler, K., & Wegner, J. D. (2022). A high-resolution canopy height model of the Earth. arXiv preprint arXiv:2204.08322.
+```
+@article{lang2022high,
+  title={A high-resolution canopy height model of the Earth},
+  author={Lang, Nico and Jetz, Walter and Schindler, Konrad and Wegner, Jan Dirk},
+  journal={arXiv preprint arXiv:2204.08322},
+  year={2022}
+}
+```
+
@@ -0,0 +1,22 @@
+name: gchm
+
+channels:
+  - conda-forge
+  - defaults
+
+dependencies:
+  - ipython
+  - matplotlib
+  - numpy
+  - wandb
+  - pathlib
+  - tqdm
+  - botocore
+  - urllib3
+  - wandb
+  - tensorboard
+  - sentinelhub=3.9.0
+  - anaconda::scikit-image
+  - anaconda::scikit-learn
+  - anaconda::typing
+  - anaconda::jupyter
@@ -0,0 +1,34 @@
+# Configuration file for GCHM
+
+# ----------------------------
+# ---------- DEPLOY ----------
+# ----------------------------
+
+export GCHM_DEPLOY_PARENT_DIR="./deploy_example"
+export YEAR="2020"
+export GCHM_DEPLOY_IMAGE_PATHS_DIR="${GCHM_DEPLOY_PARENT_DIR}/image_paths/${YEAR}"
+export GCHM_DEPLOY_IMAGE_PATHS_LOG_DIR="${GCHM_DEPLOY_PARENT_DIR}/image_paths_logs/${YEAR}"
+export GCHM_DEPLOY_SENTINEL2_DIR="${GCHM_DEPLOY_PARENT_DIR}/sentinel2/${YEAR}"
+export GCHM_DEPLOY_DIR="${GCHM_DEPLOY_PARENT_DIR}/predictions/${YEAR}"
+export GCHM_MODEL_DIR="./trained_models/GLOBAL_GEDI_2019_2020"
+export GCHM_NUM_MODELS=5
+
+export GCHM_DOWNLOAD_FROM_AWS="False"  # Set this to "True" or "False"
+export GCHM_AWS_CONFIGS_FILE="$HOME/.aws_configs"
+export GCHM_DEPLOY_SENTINEL2_AWS_DIR="${GCHM_DEPLOY_PARENT_DIR}/sentinel2_aws/${YEAR}"
+
+# make directories
+mkdir -p ${GCHM_DEPLOY_DIR}
+mkdir -p ${GCHM_DEPLOY_SENTINEL2_AWS_DIR}
+mkdir -p ${GCHM_DEPLOY_IMAGE_PATHS_LOG_DIR}
+
+# ----------------------------
+# --------- TRAINING ---------
+# ----------------------------
+
+export GCHM_TRAINING_DATA_DIR="/cluster/work/igp_psr/nlang/global_vhm/gchm_public_data/training_data/GLOBAL_GEDI_2019_2020/all_shuffled"
+export GCHM_TRAINING_EXPERIMENT_DIR="/cluster/work/igp_psr/nlang/experiments/gchm"
+# Set path to python
+export PYTHON="$HOME/venvs/gchm/bin/python"
+
+
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+DEPLOY_IMAGE_PATH="./deploy_example/sentinel2/2020/S2A_MSIL2A_20200623T103031_N0214_R108_T32TMT_20200623T142851.zip"
+GCHM_DEPLOY_DIR="./deploy_example/predictions/2020"
+
+GCHM_MODEL_DIR="./trained_models/GLOBAL_GEDI_2019_2020"
+GCHM_NUM_MODELS="5"
+
+filepath_failed_image_paths="./deploy_example/log_failed.txt"
+
+GCHM_DOWNLOAD_FROM_AWS="False"
+GCHM_DEPLOY_SENTINEL2_AWS_DIR="./deploy_example/sentinel2_aws"
+
+# create directories
+mkdir -p ${GCHM_DEPLOY_DIR}
+mkdir -p ${GCHM_DEPLOY_SENTINEL2_AWS_DIR}
+
+python3 gchm/deploy.py --model_dir=${GCHM_MODEL_DIR} \
+                       --deploy_image_path=${DEPLOY_IMAGE_PATH} \
+                       --deploy_dir=${GCHM_DEPLOY_DIR} \
+                       --deploy_patch_size=512 \
+                       --num_workers_deploy=4 \
+                       --num_models=${GCHM_NUM_MODELS} \
+                       --finetune_strategy="FT_Lm_SRCB" \
+                       --filepath_failed_image_paths=${filepath_failed_image_paths} \
+                       --download_from_aws=${GCHM_DOWNLOAD_FROM_AWS} \
+                       --sentinel2_dir=${sentinel2_dir} \
+                       --remove_image_after_pred="False"
+
@@ -0,0 +1,18 @@
+#!/bin/bash
+
+# parse and create directory from first argument
+trained_models_dir=${1}
+mkdir -p ${trained_models_dir}
+cd ${trained_models_dir}
+
+url="https://zenodo.org/record/7885610/files/gchm_deploy_example.zip?download=1"
+# download zip file
+curl $url --output "gchm_deploy_example.zip"
+# unzip
+unzip gchm_deploy_example.zip
+# delete zip file
+rm gchm_deploy_example.zip
+
+echo "DONE. Trained models extracted in:"
+pwd
+