GH-33697: [CI][Python] Nightly test for PySpark 3.2.0 fail with AttributeError on numpy.bool#33714
GH-33697: [CI][Python] Nightly test for PySpark 3.2.0 fail with AttributeError on numpy.bool#33714raulcd merged 11 commits intoapache:mainfrom
Conversation
|
|
|
@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0 |
|
Revision: 8e3ab2636fc3ac13548e870db5876d1c832641b7 Submitted crossbow builds: ursacomputing/crossbow @ actions-94e388e574
|
|
@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0 |
|
Revision: 70430e936094576067ab359fa6c23f53ea35d803 Submitted crossbow builds: ursacomputing/crossbow @ actions-7fd30999ea
|
|
@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0 |
1 similar comment
|
@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0 |
|
Revision: 427191de43ad4781125e19b8fc7a7414353f9fcb Submitted crossbow builds: ursacomputing/crossbow @ actions-431f284696
|
There was a problem hiding this comment.
we already test spark master nightly, this is the current testing combination:
{% for python_version, spark_version, test_pyarrow_only in [("3.7", "v3.1.2", "false"),
("3.8", "v3.2.0", "false"),
("3.9", "master", "false")] %}
And the build for spark master is currently passing: https://github.com/ursacomputing/crossbow/actions/runs/3934958561/jobs/6730195747#step:5:10
Maybe we can add the numpy version to the task definition only for 3.2.0 and if it is different than latest install the pinned version. I am thinking on something like:
{% for python_version, spark_version, test_pyarrow_only, numpy_version in [("3.7", "v3.1.2", "false", "latest"),
("3.8", "v3.2.0", "false", "1.23"),
("3.9", "master", "false", "latest")] %}
And the corresponding if to validate if we have to install numpy or not?
@kiszk you are spark committer. I suppose this fix won't get backported to spark 3.2.0 and we have to pin numpy always for it? Should we update the tasks for our nightlies to test with spark 3.3.0 maybe remove 3.2.0?
…on-spark.dockerfile
427191d to
d86c6a9
Compare
|
@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0 |
|
Revision: d86c6a9 Submitted crossbow builds: ursacomputing/crossbow @ actions-94d502001c
|
|
@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0 |
|
Revision: 7f76899 Submitted crossbow builds: ursacomputing/crossbow @ actions-b7b5dc6d5d
|
|
@github-actions crossbow submit test-conda-python--spark- |
|
Revision: 7f76899 Submitted crossbow builds: ursacomputing/crossbow @ actions-8c12467062
|
|
@github-actions crossbow submit test-conda-python--spark- |
|
Revision: b1b776d Submitted crossbow builds: ursacomputing/crossbow @ actions-5192a832e2
|
|
@github-actions crossbow submit test-conda-python--spark- |
|
Revision: 9743842 Submitted crossbow builds: ursacomputing/crossbow @ actions-d30511e746
|
|
@github-actions crossbow submit test-conda-python--spark- |
|
Revision: efdf9fc Submitted crossbow builds: ursacomputing/crossbow @ actions-268559d178
|
|
Looking at the logs, I think the line "#7 7.578 /bin/bash: /arrow/ci/scripts/install_numpy.sh: Permission denied" is the problem (the file wasn't copied, so installing numpy using the nonexisting file fails) However, I don't see any difference with what we already do in |
Yeah, I am trying out different things locally but none work 🤷♀️ Also asked Raul for help, if he has any ideas what could be the issue. |
|
@github-actions crossbow submit test-conda-python--spark- |
|
Revision: b1b3b99 Submitted crossbow builds: ursacomputing/crossbow @ actions-5f44f20302
|
|
@github-actions crossbow submit test-conda-python--spark- |
|
Revision: d56f4b7 Submitted crossbow builds: ursacomputing/crossbow @ actions-cce4baca43
|
|
@raulcd the fix is working now, thank you! |
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
|
@github-actions crossbow submit test-conda-python--spark- |
|
Revision: 1ebe276 Submitted crossbow builds: ursacomputing/crossbow @ actions-fd4f9f54ae
|
|
Benchmark runs are scheduled for baseline = f9a1d19 and contender = 4c1448e. 4c1448e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Rationale for this change
Fix for nightly integration tests with PySpark 3.2.0 failure.
What changes are included in this PR?
NumPy version pin in
docker-compose.yml.Are these changes tested?
Will test on the open PR with the CI.
Are there any user-facing changes?
No.