-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-45390][PYTHON] Remove distutils
usage
#43192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
distutils
usage
665f2b6
to
1d27a7a
Compare
c2f0b16
to
adfd926
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't check super closely but lgtm if it works cc @zhengruifeng and @ueshin
Thank you! |
python/pyspark/loose_version.py | ||
python/docs/source/_static/copybutton.js |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's compatible but we need to take a look once more before doing that. It's because our Apache Spark (up to 3.5.0) binary distribution doesn't include Python Software Foundation yet.
So actually we already have PSF License stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no~
Yes, we had copybutton.js
file.
spark-3.5.0-bin-hadoop3:$ find . -name copybutton.js
./python/docs/source/_static/copybutton.js
But, still no. We didn't have no PSF entry in LICENSE-binary
which is a part of Apache Spark binary distribution. So, I added in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about copybutton.js
but we need to add loose_version to binary because we had python/pyspark/cloudpickle.py
and python/pyspark/join.py
in BSD 3-Clause section. So, I added it to LICENSE-binary too.
import re | ||
|
||
|
||
class LooseVersion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this copy from distutils
? If so, maybe we need to add a few comment here to explain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, python/docs/source/_static/copybutton.js
has few lines of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's reimplemented by squashing Version
class into the existing LooseVersion
class. Let me add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have two comments on license and code comment.
Thank you, Liang-Chi! |
Thank you all. All Python tests passed. |
### What changes were proposed in this pull request? This PR aims to add `Python 3.12` to Infra docker images. Note that `Python 3.12` has a breaking change in the installation. - `distutils` module itself is removed at Python 3.12 via [PEP-632](https://peps.python.org/pep-0632) in favor of `packaging` package. - Apache Spark 4.0.0 is ready for Python 3.12 via SPARK-45390 by removing `distutils` usages - #43192 - However, some 3rd party packages are not ready for Python 3.12. So, this PR skips those kind of packages. ### Why are the changes needed? This PR is a preparation to add a daily `Python 3.12` GitHub Action job later for Apache Spark 4.0.0. As of today, Apache Spark 4.0.0 has Python 3.8 ~ Python 3.11 test coverage. - Python 3.9 (Main) - https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml - PyPy3.8, Python 3.10, Python 3.11 (Daily) - https://github.com/apache/spark/actions/workflows/build_python.yml ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6939290578 python3.12 --version Python 3.12.0 $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6939290578 python3.12 -m pip freeze alembic==1.12.1 blinker==1.7.0 certifi==2019.11.28 chardet==3.0.4 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==2.2.1 contourpy==1.2.0 coverage==7.3.2 cycler==0.12.1 databricks-cli==0.18.0 dbus-python==1.2.16 distro-info==0.23+ubuntu1.1 docker==6.1.3 entrypoints==0.4 et-xmlfile==1.1.0 Flask==3.0.0 fonttools==4.45.0 gitdb==4.0.11 GitPython==3.1.40 googleapis-common-protos==1.56.4 greenlet==3.0.1 gunicorn==21.2.0 idna==2.8 importlib-metadata==6.8.0 itsdangerous==2.1.2 Jinja2==3.1.2 joblib==1.3.2 kiwisolver==1.4.5 lxml==4.9.3 Mako==1.3.0 Markdown==3.5.1 MarkupSafe==2.1.3 matplotlib==3.8.2 mlflow==2.8.1 numpy==1.26.2 oauthlib==3.2.2 openpyxl==3.1.2 packaging==23.2 pandas==2.1.3 Pillow==10.1.0 plotly==5.18.0 protobuf==4.25.1 pyarrow==14.0.1 PyGObject==3.36.0 PyJWT==2.8.0 pyparsing==3.1.1 python-apt==2.0.1+ubuntu0.20.4.1 python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 querystring-parser==1.2.4 requests==2.31.0 requests-unixsocket==0.2.0 scikit-learn==1.3.2 scipy==1.11.4 setuptools==45.2.0 six==1.14.0 smmap==5.0.1 SQLAlchemy==2.0.23 sqlparse==0.4.4 tabulate==0.9.0 tenacity==8.2.3 threadpoolctl==3.2.0 typing_extensions==4.8.0 tzdata==2023.3 unattended-upgrades==0.1 unittest-xml-reporting==3.2.0 urllib3==2.1.0 websocket-client==1.6.4 Werkzeug==3.0.1 wheel==0.34.2 zipp==3.17.0 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43922 from dongjoon-hyun/SPARK-46020. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
… Python 12 ### What changes were proposed in this pull request? This PR aims to use `17-jammy` tag instead of `17` to prevent Python 12. ### Why are the changes needed? Two days ago, `eclipse-temurin:17` switched its baseline OS to `Ubuntu 24.04` which brings `Python 3.12`. ``` $ docker run -it --rm eclipse-temurin:17 cat /etc/os-release | grep VERSION_ID VERSION_ID="24.04" $ docker run -it --rm eclipse-temurin:17-jammy cat /etc/os-release | grep VERSION_ID VERSION_ID="22.04" ``` Since Python 3.12 supported is added only to Apache Spark 4.0.0, we need to keep using the previous OS, `Ubuntu 22.04`. - #43184 - #43192 ### Does this PR introduce _any_ user-facing change? No. This aims to recover to the same OS for consistent behavior. ### How was this patch tested? Pass the CIs with K8s IT. Currently, it's broken at Python image building phase. - https://github.com/apache/spark/actions/workflows/build_branch35.yml ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47488 from dongjoon-hyun/SPARK-49005. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…vent Python 3.12 ### What changes were proposed in this pull request? This PR aims to use `17-jammy` tag instead of `17-jre` to prevent Python 12. ### Why are the changes needed? Two days ago, `eclipse-temurin:17` switched its baseline OS to `Ubuntu 24.04` which brings `Python 3.12`. ``` $ docker run -it --rm eclipse-temurin:17-jre cat /etc/os-release | grep VERSION_ID VERSION_ID="24.04" $ docker run -it --rm eclipse-temurin:17-jammy cat /etc/os-release | grep VERSION_ID VERSION_ID="22.04" ``` Since Python 3.12 supported is added only to Apache Spark 4.0.0, we need to keep using the previous OS, `Ubuntu 22.04`. - #43184 - #43192 ### Does this PR introduce _any_ user-facing change? No. This aims to recover to the same OS for consistent behavior. ### How was this patch tested? Pass the CIs with K8s IT. Currently, it's broken at Python image building phase. - https://github.com/apache/spark/actions/workflows/build_branch34.yml ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47489 from dongjoon-hyun/SPARK-49005-3.4. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…vent Python 3.12 This PR aims to use `17-jammy` tag instead of `17-jre` to prevent Python 12. Two days ago, `eclipse-temurin:17` switched its baseline OS to `Ubuntu 24.04` which brings `Python 3.12`. ``` $ docker run -it --rm eclipse-temurin:17-jre cat /etc/os-release | grep VERSION_ID VERSION_ID="24.04" $ docker run -it --rm eclipse-temurin:17-jammy cat /etc/os-release | grep VERSION_ID VERSION_ID="22.04" ``` Since Python 3.12 supported is added only to Apache Spark 4.0.0, we need to keep using the previous OS, `Ubuntu 22.04`. - apache#43184 - apache#43192 No. This aims to recover to the same OS for consistent behavior. Pass the CIs with K8s IT. Currently, it's broken at Python image building phase. - https://github.com/apache/spark/actions/workflows/build_branch34.yml No. Closes apache#47489 from dongjoon-hyun/SPARK-49005-3.4. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to remove
distutils
usage from Spark codebase.BEFORE
AFTER
Why are the changes needed?
Currently, Apache Spark ignores the warnings but the module itself is removed at Python 3.12 via PEP-632 in favor of
packaging
package.spark/python/pyspark/__init__.py
Lines 54 to 56 in 58c24a5
Initially, #43184 proposed to follow Python community guideline via using
packaging
package, but, this PR is embeddingLooseVersion
Python class to avoid adding a new package requirement.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.