Improve dask example notebooks #100

tcmetzger · 2025-02-24T18:03:53Z

Improve the dask-based examples as discussed here:

tcmetzger · 2025-02-27T21:39:09Z

@brendancol Since we are using things like black and isort in our pre-commit configuration for the main package, we could also add some notebook-specific linting to our pre-commit config.

Something like this:

-   repo: https://github.com/nbQA-dev/nbQA
    rev: 1.2.2
    hooks:
    -   id: nbqa-black
        files: \.ipynb$
    -   id: nbqa-flake8
        files: \.ipynb$
    -   id: nbqa-isort
        files: \.ipynb$

-   repo: https://github.com/kynan/nbstripout
    rev: 0.5.0
    hooks:
    -   id: nbstripout
        files: \.ipynb$

This might need to be adapted, I haven't tested it!

tcmetzger · 2025-03-11T16:11:21Z

Now that we have the notebooks for the "small" examples, we want to build an additional notebook with a larger dataset to showcase the Dask integration: #118

tcmetzger · 2025-03-12T16:16:21Z

@brendancol @sehnem As discussed today: These large problems have lots of columns and layers to really stress the code. The code RRTMGP is vectorized on the inside. Default chunk size is 720x720x1. If you pass Robert's code that kind of chunk, it breaks because of memory. For radiative transfer solve, that might be too much, for example. So we need to chunk stuff up.
We can chunk it in any way, but we need to chunk it. The issue is: to do the radiative transfer solve, you need a chunk that is in sight and level (and the product of those two). Within Dask, we might want to cycle through those to vectorize?
We need a vector length (whatever dimension it comes from).

The other, related question: it seems that right now (maybe only gas optics) we are calling several kernels in a row. This would mean we have one worker to work on these calls in a row? In theory, different problems are sent to different workers - increased complexity of the problem means more time.

The dataset we have is interesting (the diamond dataset from #118). It has half a dozen dimensions. We don't need to overfit to this dataset. But it has several dimensions that vary, a level dimension, and maybe some others as well. to do the radiative transfer, we need to be able to have some number of 1-dimension and all the level dimensions. Maybe code should use whatever kind of chunking it needs and the solver does the chunking it needs. Or it might make more sense to have the cunking upfront to have arbitrary points on all the levels.

To-Dos

Determining the chunk size stays a burden on the user - but we should determine whether the user's chunk size makes sense (i.e. validate the chunk size and raise an error if necessary to flag it) - a "gut check" with some kind of validate method (which isn't perfect but a start of a hook where we could add more heuristics later - we start with the most robust cheks first)
For the test dask integration with pytest: test the validate method and "wrong" chunks parameters
Potentially test on multi-node (Jupyter hub) and single-node configurations with the

tcmetzger added this to the Phase 2 milestone Feb 24, 2025

tcmetzger assigned brendancol and sehnem Feb 24, 2025

tcmetzger moved this to In progress in pyRTE Phase 2 Feb 24, 2025

tcmetzger added this to pyRTE Phase 2 Feb 24, 2025

tcmetzger mentioned this issue Feb 24, 2025

54 add horizontal cpu scaling via dask #98

Merged

brendancol moved this from Review to In progress in pyRTE Phase 2 Mar 3, 2025

tcmetzger linked a pull request Mar 11, 2025 that will close this issue

Dyamond2 Example #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve dask example notebooks #100

Improve dask example notebooks #100

tcmetzger commented Feb 24, 2025

tcmetzger commented Feb 27, 2025 •

edited

Loading

tcmetzger commented Mar 11, 2025 •

edited

Loading

tcmetzger commented Mar 12, 2025

Improve dask example notebooks #100

Improve dask example notebooks #100

Comments

tcmetzger commented Feb 24, 2025

tcmetzger commented Feb 27, 2025 • edited Loading

tcmetzger commented Mar 11, 2025 • edited Loading

tcmetzger commented Mar 12, 2025

To-Dos

tcmetzger commented Feb 27, 2025 •

edited

Loading

tcmetzger commented Mar 11, 2025 •

edited

Loading