You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have coupled RTE-RRTMGP to a GCM but disabled OMP for RRTMGP and parallelized the calls from the outside (calling RRTMGP in chunks) which redues memory usage by a lot.
With RRTMGP 1.9 I started getting long HDF warnings in the console:
HDF5-DIAG: Error detected in HDF5 (1.10.10) thread 1:
#000: ../../../src/H5A.c line 484 in H5Aopen_by_name(): can't open attribute
major: Attribute
minor: Can't open object
#001: ../../../src/H5Aint.c line 542 in H5A__open_by_name(): unable to load attribute info from object header
major: Attribute
minor: Unable to initialize object
#002: ../../../src/H5Oattribute.c line 496 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitGroomNumberOfSignificantDigits'
major: Attribute
minor: Object not found
And then this message repeats again and again. The code continues to function but much slower.
It only occurs if OMP is on and not in the thread number 0, which is why I assume it is a very tricky low-level problem. I was able to track it down to the conversion to netCDF4 made in #4 by the following reproducible example:
program control
use mo_gas_optics_util_string, only: lower_case
use mo_gas_optics_rrtmgp, only: ty_gas_optics_rrtmgp
use mo_load_coefficients, only: load_and_init
use mo_gas_concentrations, only: ty_gas_concs
use mo_optical_props, only: ty_optical_props_1scl, &
ty_optical_props_2str
use mo_cloud_optics_rrtmgp, only: ty_cloud_optics_rrtmgp
use mo_load_cloud_coefficients, only: load_cld_lutcoeff
use omp_lib, only: omp_get_thread_num
character(len=3), dimension(8) :: active_gases = (/ &
"N2 ","O2 ","CH4","O3 ","CO2","H2O","N2O","CO " &
/)
character(len=32), dimension(size(active_gases)) :: gases_lowercase
call read_nc_files()
contains
subroutine read_nc_files()
integer :: jc
type(ty_gas_concs) :: gas_concentrations_sw
type(ty_gas_concs) :: gas_concentrations_lw
type(ty_gas_optics_rrtmgp) :: k_dist_sw,k_dist_lw
type(ty_cloud_optics_rrtmgp) :: cloud_optics_sw
type(ty_cloud_optics_rrtmgp) :: cloud_optics_lw
do jc=1,size(active_gases)
gases_lowercase(jc) = trim(lower_case(active_gases(jc)))
end do
write(*,*) gas_concentrations_sw%init(gases_lowercase)
write(*,*) gas_concentrations_lw%init(gases_lowercase)
!$omp parallel do private(jc)
do jc=1,2
write(*,*) omp_get_thread_num()
!$omp critical
! call load_and_init(k_dist_sw,trim("rrtmgp-gas-sw-g112_old.nc"),gas_concentrations_sw)
call load_and_init(k_dist_sw,trim("rrtmgp-gas-sw-g112_new.nc"),gas_concentrations_sw)
! call load_and_init(k_dist_sw,trim("rrtmgp-gas-sw-g112.nc"),gas_concentrations_sw)
! call load_and_init(k_dist_lw,trim("rrtmgp-gas-lw-g128.nc"),gas_concentrations_lw)
! call load_cld_lutcoeff(cloud_optics_sw,trim("rrtmgp-clouds-sw-bnd.nc"))
! call load_cld_lutcoeff(cloud_optics_lw,trim("rrtmgp-clouds-lw-bnd.nc"))
!$omp end critical
write(*,*) omp_get_thread_num()
enddo
!$omp end parallel do
end subroutine read_nc_files
end program control
@MHBalsmeier Thanks, this is quite interesting. Thanks for letting us know.
In multi-threads environments I understand it to be more common to read the data with a single thread and broadcast it, in part so there aren't lots of processes trying to access the same file and the same time. That's why the underlying RRTMGP initialization routines accept data rather than files names. Would this eliminate the error messages?
@RobertPincus I began a version where the data is only read once and then broadcast to the other threads and the error did not occur. However, I did not go through with this way since in my actual code I call the RRTMGP-coupler (which is a subroutine) inside the omp do loop for each radiation slice individually (all the slices have different numbers of columns also). Thus, this would require some refactoring and I'm currently happy with using the version 3 files.
I also think through the omp critical section the file should never be accessed by more than one thread at the same time.
If the problem persists with netCDF 4.9.3 I will have to do that though. Will report here if it is solved with 4.9.3.
Feel free to close this for now.
I have coupled RTE-RRTMGP to a GCM but disabled OMP for RRTMGP and parallelized the calls from the outside (calling RRTMGP in chunks) which redues memory usage by a lot.
With RRTMGP 1.9 I started getting long HDF warnings in the console:
And then this message repeats again and again. The code continues to function but much slower.
It only occurs if OMP is on and not in the thread number 0, which is why I assume it is a very tricky low-level problem. I was able to track it down to the conversion to netCDF4 made in #4 by the following reproducible example:
I assume it is the same error as this one here: https://code.mpimet.mpg.de/boards/1/topics/14326
It should be resolved in netCDF 4.9.3 (I am using netCDF 4.9.2).
It does not occur when converting the files back to netCDF 3 with
nccopy -3 infile.nc outfile.nc
.If this is an option I can make a pull request for this, otherwise this is just to inform others of this problem.
CMakeLists.txt
The text was updated successfully, but these errors were encountered: