Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dask example notebooks #100

Open
tcmetzger opened this issue Feb 24, 2025 · 3 comments · May be fixed by #118
Open

Improve dask example notebooks #100

tcmetzger opened this issue Feb 24, 2025 · 3 comments · May be fixed by #118
Assignees
Milestone

Comments

@tcmetzger
Copy link
Collaborator

Improve the dask-based examples as discussed here:

#98 (comment)

@tcmetzger
Copy link
Collaborator Author

tcmetzger commented Feb 27, 2025

@brendancol Since we are using things like black and isort in our pre-commit configuration for the main package, we could also add some notebook-specific linting to our pre-commit config.

Something like this:

-   repo: https://github.com/nbQA-dev/nbQA
    rev: 1.2.2
    hooks:
    -   id: nbqa-black
        files: \.ipynb$
    -   id: nbqa-flake8
        files: \.ipynb$
    -   id: nbqa-isort
        files: \.ipynb$

-   repo: https://github.com/kynan/nbstripout
    rev: 0.5.0
    hooks:
    -   id: nbstripout
        files: \.ipynb$

This might need to be adapted, I haven't tested it!

@brendancol brendancol moved this from Review to In progress in pyRTE Phase 2 Mar 3, 2025
@tcmetzger
Copy link
Collaborator Author

tcmetzger commented Mar 11, 2025

Now that we have the notebooks for the "small" examples, we want to build an additional notebook with a larger dataset to showcase the Dask integration: #118

@tcmetzger tcmetzger linked a pull request Mar 11, 2025 that will close this issue
@tcmetzger
Copy link
Collaborator Author

@brendancol @sehnem As discussed today: These large problems have lots of columns and layers to really stress the code. The code RRTMGP is vectorized on the inside. Default chunk size is 720x720x1. If you pass Robert's code that kind of chunk, it breaks because of memory. For radiative transfer solve, that might be too much, for example. So we need to chunk stuff up.
We can chunk it in any way, but we need to chunk it. The issue is: to do the radiative transfer solve, you need a chunk that is in sight and level (and the product of those two). Within Dask, we might want to cycle through those to vectorize?
We need a vector length (whatever dimension it comes from).

The other, related question: it seems that right now (maybe only gas optics) we are calling several kernels in a row. This would mean we have one worker to work on these calls in a row? In theory, different problems are sent to different workers - increased complexity of the problem means more time.

The dataset we have is interesting (the diamond dataset from #118). It has half a dozen dimensions. We don't need to overfit to this dataset. But it has several dimensions that vary, a level dimension, and maybe some others as well. to do the radiative transfer, we need to be able to have some number of 1-dimension and all the level dimensions. Maybe code should use whatever kind of chunking it needs and the solver does the chunking it needs. Or it might make more sense to have the cunking upfront to have arbitrary points on all the levels.

To-Dos

  • Determining the chunk size stays a burden on the user - but we should determine whether the user's chunk size makes sense (i.e. validate the chunk size and raise an error if necessary to flag it) - a "gut check" with some kind of validate method (which isn't perfect but a start of a hook where we could add more heuristics later - we start with the most robust cheks first)
  • For the test dask integration with pytest: test the validate method and "wrong" chunks parameters
  • Potentially test on multi-node (Jupyter hub) and single-node configurations with the

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

3 participants