ExaGeoStatCPP User Manual

Content

ExaGeoStatCPP v1.1.0
Configurations of the software
Building ExaGeoStatCPP
Arguments
List of Descriptors
Supported operations
Manuals
Contributing

ExaGeoStatCPP

Current Version of ExaGeoStatCPP: 1.1.0

Supported Operations:

(Data Generation): Generating large geospatial synthetic datasets using dense, Diagonal Super-Tile (DST) and Tile Low-Rank (TLR) approximation techniques.
(Data Modeling): Modeling large geospatial datasets on dense, DST and TLR approximation techniques through the Maximum likelihood Estimation (MLE) operation.
(Data Prediction): Predicting missing measurements on given locations using dense, DST, and TLR approximation techniques.
(MLOE/MMOM): Computing the Mean Loss of Efficiency (MLOE), Mean Misspecification of the Mean Square Error (MMOM), and Root mean square MOM (RMOM) to describe the prediction performance over the whole observation region.
(Fisher Information Matrix (FIM)): Quantifying the information content that a variable x carries about a parameter $\theta$ within a Gaussian distribution.

Supported Covariance Functions:

Univariate Matérn (Gaussian/Stationary)
Univariate Matérn with Nugget (Gaussian/Stationary)
Flexible Bivariate Matérn (Gaussian/Stationary)
Parsimonious Bivariate Matérn (Gaussian/Stationary)
Parsimonious trivariate Matérn (Gaussian/Stationary)
Univariate Space/Time Matérn (Gaussian/Stationary)
Bivariate Space/Time Matérn (Gaussian/Stationary)
Tukey g-and-h Univariate Matérn (non-Gaussian/Stationary)
Tukey g-and-h Univariate Power Exponential (non-Gaussian/Stationary)

To add your kernel, please refer to Contribution Guidelines

Programming models:

MPI
Task-based programming models

External libraries:

NLOPT https://nlopt.readthedocs.io/en/latest/
GSL https://www.gnu.org/software/gsl/
HWLOC https://www.open-mpi.org/projects/hwloc/
StarPU dynamic runtime system https://starpu.gitlabpages.inria.fr/
HCORE https://github.com/ecrc/hcore
HiCMA https://github.com/ecrc/hicma
Stars-H https://github.com/ecrc/stars-h
Chameleon https://gitlab.inria.fr/solverstack/chameleon

Project Hierarchy

cmake A directory contains essential CMake modules that facilitate the importation and location of required dependencies.
docs A directory contains all the necessary documents.
examples A directory contains a comprehensive collection of demo code that illustrates the framework's application and demonstrates its features and capabilities.
inst A directory contains all the system's header files, mirroring the structure of the src directory.
man A directory contains all the R functions documentation.
scripts A directory contains benchmarking scripts.
src A directory contains all the source files.
tests A directory contains all the test files and follows the same structure as the src folder.
clean_build.sh A script is designed to compile the software tests once all dependencies are installed, and it is set to build everything by default.
CMakeLists.txt The top-level CMake file to configure the build system.
configure A Script used to generate the building system inside a 'bin' directory.

Configurations

Run the help of configure to know the needed arguments for your specific options.
```
./configure -h
```
To enable R interface, add -r disabled by default.
To enable support of HiCMA, add -H disabled by default.
To enable examples, add -e enabled by default.
To enable tests, add -t disabled by default.
To enable heavy tests, add -T disabled by default.
To enable CUDA, add -c disabled by default.
To enable MPI, add -m disabled by default.
To enable verbose output, add -v disabled by default.
To change the installation path of the dependencies, use -i <installation/path> the default is project_path/installdir/_deps/ on Unix systems.
To enable packaging system for distribution, add -p disabled by default.
To enable showing code warnings, add -w disabled by default.
To manually set mkl as blas vendor, add --use-mkl. MKL is required as blas vendor and it's automatically detected but in some environments it need to be manually set.

Building

Run the help of clean_build.sh to know additional argument options.
```
./clean_build.sh -h
```
Run clean_build.sh to build the project.
```
./clean_build.sh
```
To enable the installation of the project, add -i disabled by default.
To enable verbose printing, add -v disabled by default.
To enable building with a specific number of threads, add -j <thread_number> running with maximum number of threads by default.

Arguments

These are the arguments that you can specify when running any C++ example.

{Mandatory} To set the problem size (N)
```
  --N=<problem_size>
```
{Mandatory} To set the kernel
```
  --kernel=<supported_kernel>
```
{Mandatory} To set the dense tile size in the case of Chameleon
```
  --dts=<value>
```
{Mandatory} To set the low tile size in case of HiCMA
```
  --lts=<value>
```
{Optional} To set the dimension, the default is 2D
```
  --dimension=<2D/3D/ST>
```
{Optional} To set the p grid, the default is 1
```
  --p=<value>
```
{Optional} To set the q grid, the default is 1
```
  --q=<value>
```
{Optional} To set the time slot, the default is 1
```
  --time_slot=<value>
```
{Optional} To set the computation, the default is dense
```
  --computation=<dense/tlr/dst>
```
{Optional} To set the precision, the default is double
```
  --precision=<single/double>
```
{Optional} To set the number of cores, the default is 1
```
  --cores=<value>
```
{Optional} To set the number of GPUs, the default is 0
```
  --gpus=<value>
```
{Optional} To set the number of unknown observations to be predicted, the default is 0
```
  --Zmiss=<value>
```
{Optional} To set the path of the observation file
```
  --observations_file=<path/to/file>
```
{Optional} To set the max rank, the default is 1
```
  --max_rank=<value>
```
{Optional} To set the lower bounds of optimization
```
  --olb=<value:value:....:value>
```
{Optional} To set the upper bounds of optimization
```
  --oub=<value:value:....:value>
```
{Optional} To set the initial theta
```
  --itheta=<value:value:....:value>
```
{Optional} To set the target theta
```
  --ttheta=<value:value:....:value>
```
{Optional} To set the estimated theta
```
  --etheta=<value:value:....:value>
```
{Optional} To set the seed value, the default is 0
```
  --seed=<value>
```
{Optional} To set the verbose value, the default is standard
```
  --verbose=<quiet/standard/detailed>
```
{Optional} To set the path of log files to be written, the default is ./exageostat-cpp/synthetic_ds/
```
  --log_path=<path/to/file>
```
{Optional} To enable reading a CSV file containing real data, if not entered the default is the generation of synthetic data
```
  --data_path=<path/to/file>
```
{Optional} To enable out-of-core (OOC), the default is OFF
```
  --OOC 
```
{Optional} To enable approximation mode, the default is ON
```
  --approximation_mode
```
{Optional} To enable writing log files, the default is OFF
```
  --log
```

List of Descriptors

Covariance Matrix Descriptors

DESCRIPTOR_C: Covariance matrix C descriptor
DESCRIPTOR_C11: Covariance Matrix C11 descriptor.
DESCRIPTOR_C12: Covariance Matrix C21 descriptor.
DESCRIPTOR_C21: Covariance Matrix C12 descriptor.
DESCRIPTOR_C22: Covariance Matrix C22 descriptor.

Measurements Matrix Descriptors

DESCRIPTOR_Z: Measurements Z descriptor.
DESCRIPTOR_Z_COPY: A copy of Measurements Z descriptor.
DESCRIPTOR_Z_OBSERVATIONS: Observed Measurements Z descriptor.
DESCRIPTOR_Z_Actual: Actual Measurements Z descriptor.
DESCRIPTOR_Z_MISS: Missing Measurements Z descriptor.

Measurements Sub-Matrix Descriptors

DESCRIPTOR_Z_1: Measurements Z1 sub-matrix descriptor.
DESCRIPTOR_Z_2: Measurements Z2 sub-matrix descriptor.
DESCRIPTOR_Z_3: Measurements Z3 sub-matrix descriptor.

Dot Product Descriptors

DESCRIPTOR_PRODUCT: Dot product descriptor.
DESCRIPTOR_PRODUCT_1: Dot product descriptor.
DESCRIPTOR_PRODUCT_2: Dot product descriptor.
DESCRIPTOR_PRODUCT_3: Dot product descriptor.

Determinant Descriptors

DESCRIPTOR_DETERMINANT: Determinant descriptor.

Mean Square Prediction Error Descriptors

DESCRIPTOR_MSPE: Mean Square Error descriptor.

HiCMA Descriptors

DESCRIPTOR_CRK: HiCMA descCrk descriptor.
DESCRIPTOR_C12RK: HiCMA descCrk descriptor.
DESCRIPTOR_C22RK: HiCMA descCrk descriptor.
DESCRIPTOR_CD: HiCMA descCD descriptor.
DESCRIPTOR_C12D: HiCMA descCD descriptor.
DESCRIPTOR_C22D: HiCMA descCD descriptor.
DESCRIPTOR_CUV : HiCMA descCUV descriptor
DESCRIPTOR_C12UV : HiCMA descCUV descriptor
DESCRIPTOR_C22UV : HiCMA descCUV descriptor

Supported Operations

Provide Arguments

To use any operations, you must initially supply the necessary arguments to the operation via the Configurations module. There are two methods available for setting your arguments:

Provide your arguments with the command line.

    // Create a new configuration object.
    Configurations configurations;
    // Initialize the arguments with the provided command line arguments
    configurations.InitializeArguments(argc, argv);

Set your arguments manually in the code.

    Configurations synthetic_data_configurations;
    synthetic_data_configurations.SetProblemSize(10);
    synthetic_data_configurations.SetKernelName("BivariateSpacetimeMaternStationary");
    synthetic_data_configurations.SetPrecision(exageostat::common::double);

Initialize and Finalize the Hardware class

To use any operations, you must initialize the hardware by selecting the number of CPUs and/or GPUs.

// Initialize an instance of the hardware
auto hardware = ExaGeoStatHardware(computation, number of cores, number of gpus, p, q);

// Other code goes here

// Finalize the hardware instance.
hardware.FinalizeHardware()

The subsequent arguments are as follows:

computation: Specifies the computation mode for the solver.
number of cores: Indicates the number of CPU cores to be used for the solver.
number of gpus: Specifies the number of GPUs to be used for the solver.

ExaGeoStat R Interface

hardware <- new(Hardware, computation, number of cores, number of gpus, p-grid, q-grid);
hardware$finalize_hardware()

First arguement represents the name of the R class that wrapps its correponding C++ class, the rest of arguments are the same as the C++ version

Types of Data

ExaGeoStatCPP can be used with 2 types of data:

Synthetic data i.e. generated by the software according to the user arguments.
Real data e.g. data from satellite imagery or weather sensors. Real data can be used to train the software to predict the values of new data better.

Using Synthetic Data

Here we generate the data to be used by providing the needed arguments with the Configurations module, and then using the following code:

// Create a new ExaGeoStat data that holds the locations and descriptors data.
std::unique_ptr<ExaGeoStatData<double>> data;

// Generate data by passing your arguments through the configurations, hardware, 
//and container of the data, which will be filled with the newly generated data.
ExaGeoStat<double>::ExaGeoStatLoadData(configurations, data);

Using Real Data

Here we use existing data by providing the path to it:

The Data Path must be passed to Configuration
```
data_path <- <path/to/file>
```

And then using the following code:

exageostat_data <- simulate_data(kernel=kernel, initial_theta=initial_theta, problem_size=problem_size, dts=dts, dimension=dimension, data_path=data_path)

Location Getters

ExaGeoStat support 2D and 3D spatial locations, and therefore we have getters for X, Y and Z coordinates.

X-coordinate Getter

double *locations_x = exageostat_data->GetLocations()->GetLocationX();

Y-coordinate Getter

double *locations_y = exageostat_data->GetLocations()->GetLocationY();

Z-coordinate Getter

double *locations_z = exageostat_data->GetLocations()->GetLocationZ();

ExaGeoStat R Interface

locations <- get_locations(data=exageostat_data)

Returns all the locations values. The subsequent arguments are as follows:

exagostat_data: pointer to ExaGeoStatData object containing the spatial data.

Descriptive Z Values Getter

This function is used to retrieve descriptive Z values from ExaGeoStat data based on type of descriptor.

//in this example, we use a chameleon descriptor. Similar code can be used for hicma descriptor.
DescriptorType descriptor_type = CHAMELEON_DESCRIPTOR;
void *descriptor = exageostat_data->GetDescriptorData()->GetDescriptor(descriptor_type, DESCRIPTOR_Z).chameleon_desc;
double *desc_Z_values = exagostat_data->GetDescriptorData()->GetDescriptorMatrix(descriptor_type, descriptor);

The used variables are as follows:

descriptor_type: enum denoting the descriptor type,e.g. CHAMELEON_DESCRIPTOR,HICMA_DESCRIPTOR.
exagostat_data: pointer to ExaGeoStatData object containing the spatial data.
desc_Z_values: pointer to descriptor matrix.

ExaGeoStat R Interface

desc_Z_values <- get_Z_measurement_vector(data=exageostat_data, type="chameleon")

The subsequent arguments are as follows:

exagostat_data: pointer to ExaGeoStatData object containing the spatial data.
type: string specifying the type of descriptor value to retrieve.

Data Modeling

To use data modeling, you have to do this operation.

//You have to pass your arguments through the configurations, your hardware, and your data.
ExaGeoStat<double>::ExaGeoStatDataModeling(hardware, configurations, data, z_matrix);

ExaGeoStat R Interface

estimated_theta <- model_data(matrix=z_value, x=locations_x, y=locations_y, kernel=kernel, dts=dts, dimension=dimension,lb=lower_bound, ub=upper_bound, mle_itr=10, computation=computation, band=1)

Or

estimated_theta <- model_data(data=exageostat_data, kernel=kernel, dts=dts, dimension=dimension,lb=lower_bound, ub=upper_bound, mle_itr=10)

Data Prediction

//You have to pass your arguments through the configurations, your hardware, and your data.
ExaGeoStat<double>::ExaGeoStatPrediction(configurations, data, z_matrix);

ExaGeoStat R Interface

predict_data(train_data=list(locations_x, locations_y, locations_z, z_value), test_data=list(test_x, test_y, test_z), kernel=kernel, dts=dts, estimated_theta=estimated_theta)

Fisher Function

Pass the fisher arguments to the Configurations.

--fisher

Call the Data Prediction function.

// you have to pass your arguments through the configurations, your hardware and your data.
ExaGeoStat<double>::ExaGeoStatPrediction(configurations, data, z_matrix);

ExaGeoStat R Interface

fisher_matrix <- fisher(train_data=list(locations_x, locations_y, z_value), test_data=list(test_x, test_y), kernel=kernel, dts=dts, estimated_theta=estimated_theta)

MLOE-MMOM Function

Pass the MLOE-MMOM arguments to the Configurations.

--mloe-mmom

Call the Data Prediction function.

// you have to pass your arguments through the configurations, your hardware and your data.
ExaGeoStat<double>::ExaGeoStatPrediction(configurations, data, z_matrix);

ExaGeoStat R Interface

result_mloe_mmom = mloe_mmom(train_data=list(locations_x, locations_y, z_value), test_data=list(test_x, test_y), kernel=kernel, dts=dts, estimated_theta=estimated_theta, true_theta=true_theta)

IDW Function

Pass the IDW arguments to the Configurations.

--idw

Call the Data Prediction function.

// you have to pass your arguments through the configurations, your hardware and your data.
ExaGeoStat<double>::ExaGeoStatPrediction(configurations, data, z_matrix);

ExaGeoStat R Interface

idw_error = idw(train_data=list(locations_x, locations_y, z_value), test_data=list(test_x, test_y), kernel=kernel, dts=dts, estimated_theta=estimated_theta, test_measurements=test_measurements)

Manuals

Find a detailed Manual for R functions in ExaGeoStatCPP-R-Interface-Manual
Find a detailed Manual for C++ functions in ExaGeoStatC-CPP-Manual
Doxygen Manual: https://ecrc.github.io/ExaGeoStatCPP

Contributing

Contribution Guidelines

Files

USER_MANUAL.md

Latest commit

History

USER_MANUAL.md

File metadata and controls

ExaGeoStatCPP User Manual

Content

ExaGeoStatCPP

Supported Operations:

Supported Covariance Functions:

Programming models:

External libraries:

Project Hierarchy

Configurations

Building

Arguments

List of Descriptors

Covariance Matrix Descriptors

Measurements Matrix Descriptors

Measurements Sub-Matrix Descriptors

Dot Product Descriptors

Determinant Descriptors

Mean Square Prediction Error Descriptors

HiCMA Descriptors

Supported Operations

Provide Arguments

Initialize and Finalize the Hardware class

ExaGeoStat R Interface

Types of Data

Using Synthetic Data

Using Real Data

Location Getters

X-coordinate Getter

Y-coordinate Getter

Z-coordinate Getter

ExaGeoStat R Interface

Descriptive Z Values Getter

ExaGeoStat R Interface

Data Modeling

ExaGeoStat R Interface

Data Prediction

ExaGeoStat R Interface

Fisher Function

ExaGeoStat R Interface

MLOE-MMOM Function

ExaGeoStat R Interface

IDW Function

ExaGeoStat R Interface

Manuals

Contributing