Disaster Impact Database

Disaster Impact Database is an open-source project designed to ingest, process, and analyse disaster-related data from multiple sources. The project reads raw data from Azure Blob Storage, normalises CSV files, and lays the groundwork for future data consolidation and analysis. Data sources include GLIDE, GDACS, CERF, EMDAT, IDMC, IFRC and more.

Project Purpose

The primary goal is to build a unified disaster impact database that:

Downloads raw data from Azure Blob Storage.
Curates and normalises data from various humanitarian and disaster sources.
Standardixes data into consistent formats using JSON schemas.
Exports data as normalised CSV files.
Prepares for future consolidation by grouping events by type, country, and event date (with a ±7 days window).

Project Structure

.
├── docs                # Documentation
├── LICENSE             # Project license
├── Makefile            # Automation commands
├── notebooks           # Jupyter notebooks for data inspection and experimentation
├── poetry.lock         # Poetry lock file for dependencies
├── poetry.toml         # Poetry configuration
├── pyproject.toml      # Project metadata and dependency management
├── README.md           # This file
├── src                 # Source code modules
│   ├── cerf          # CERF data processing (downloader, normalization, schema)
│   ├── data_consolidation  # Future module for data consolidation tasks
│   ├── disaster_charter    # Disaster Charter data processing
│   ├── emdat         # EM-DAT data processing
│   ├── gdacs         # GDACS data processing
│   ├── glide         # GLIDE data processing
│   ├── idmc          # IDMC data processing
│   ├── ifrc_eme      # IFRC data processing
│   ├── unified       # Unified schema, consolidated data, and blob upload utilities
│   └── utils         # Utility scripts
├── static_data         # Static reference data (e.g., country codes, event codes)
└── tests               # Unit and integration tests

Key Features

Data Download: Retrieve raw data directly from Azure Blob Storage.
Data Curation: Clean and preprocess raw data.
Normalisation & Standardisation: Process and flatten, ensuring data from different sources is standardised.
Data Schemas: Use JSON schemas to validate and enforce data structure consistency.
CSV Output: Export normalized data to CSV for downstream analysis.
Future Data Consolidation: Group events by type, country, and event date (with a ±7 days window) to create a consolidated dataset.
Automation: Utilise Makefile commands for environment setup, testing, linting, and more.

Usage Instructions

Environment Setup

This project uses Poetry for dependency management. To set up your development environment:

Create and activate the virtual environment:

make .venv

Running Normalisation Scripts

Each data source module under src contains scripts for data normalisation. For example, to run the normalisation process for GLIDE data:

python -m src.glide.data_normalisation_glide

Replace glide with the appropriate module name for other data sources (e.g., gdacs, cerf, etc.).

Automation with Makefile

The included Makefile provides several automation commands:

Set up the environment:
```
make .venv
```
Run tests:
```
make test
```
Lint the code:
```
make lint
```
Clean the environment:
```
make clean
```

Testing, Linting, and Environment Cleanup

Testing: Run unit and integration tests located in the tests directory.
```
make test
```
Linting: Check code quality with linting tools.
```
make lint
```
Clean Environment: Remove temporary files and reset the environment as needed.
```
make clean
```

Development Notes & Key Scripts

Key Scripts:
- Normalization: src/*/data_normalisation*.py
- JSON Schemas: Located in each module (e.g., src/cerf/cerf_schema.json)
- CSV Processing: src/utils/combine_csv.py, src/utils/splitter.py
- Future Consolidation: src/data_consolidation/
Development Notes:
- Update JSON schemas as the data structure evolves.
- Extend the Makefile for additional automation tasks.
- Contributions to enhance data consolidation features are highly encouraged.

Contributing

Contributions are welcome! To contribute:

Clone the repository and create a branch from main.
Submit pull requests with detailed descriptions of your changes.

License

This project is licensed under the GNU GENERAL PUBLIC license. See the LICENSE file for details.

Author Information

Author: ediakatos
Contact: [email protected]

Thank you for using the Disaster Impact Database! For issues or feature requests, please open an issue on GitHub. Happy coding!

Name	Name	Last commit message	Last commit date
Latest commit ediakatos Added poetry venv in path for ci Mar 13, 2025 05928b9 · Mar 13, 2025 History 432 Commits
.github/workflows	.github/workflows	Added poetry venv in path for ci	Mar 13, 2025
docs	docs	Updated idmc and ifrc emergencies to follow lint rules , also the 2 m…	Feb 24, 2025
notebooks	notebooks	Deleted dead code from the notebook	Mar 13, 2025
src	src	Updated the disaster charter to resolve warning	Feb 26, 2025
static_data	static_data	changes for idmc	Feb 17, 2025
tests	tests	Created a glide test	Feb 25, 2025
.gitignore	.gitignore	Updated the main README	Feb 19, 2025
.pre-commit-config.yaml	.pre-commit-config.yaml	Deleted dead code that was causing confilict	Mar 13, 2025
LICENSE	LICENSE	Initial commit	Nov 18, 2024
Makefile	Makefile	Updated the disaster charter to resolve warning	Feb 26, 2025
README.md	README.md	Updating readme	Mar 13, 2025
poetry.lock	poetry.lock	Add nbqa as a dev dependency	Mar 13, 2025
poetry.toml	poetry.toml	Initial commit	Nov 18, 2024
pyproject.toml	pyproject.toml	Add nbqa as a dev dependency	Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Impact Database

Project Purpose

Project Structure

Key Features

Usage Instructions

Environment Setup

Running Normalisation Scripts

Automation with Makefile

Testing, Linting, and Environment Cleanup

Development Notes & Key Scripts

Contributing

License

Author Information

About

Releases

Packages

Languages

License

mapaction/disaster-impact

Folders and files

Latest commit

History

Repository files navigation

Disaster Impact Database

Project Purpose

Project Structure

Key Features

Usage Instructions

Environment Setup

Running Normalisation Scripts

Automation with Makefile

Testing, Linting, and Environment Cleanup

Development Notes & Key Scripts

Contributing

License

Author Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages