Skip to content

add arrow to contrib, move to sha256#157

Merged
poshul merged 27 commits intomasterfrom
add_arrow
Dec 3, 2025
Merged

add arrow to contrib, move to sha256#157
poshul merged 27 commits intomasterfrom
add_arrow

Conversation

@poshul
Copy link
Copy Markdown
Contributor

@poshul poshul commented Dec 1, 2025

Summary by CodeRabbit

  • New Features

    • Arrow added as a selectable cross‑platform build target with integrated platform-specific build/install and logging.
  • Chores

    • Archive verification migrated from SHA1 to SHA256 across contrib downloads.
    • Build orchestration updated to include Arrow artifacts and include paths.
    • CI matrix, prerequisites, Docker builds, and parallel build handling unified and adjusted; several platform-specific build flag usages simplified.
  • Documentation

    • Contributors/source link updated to a direct GitHub URL.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 1, 2025

Walkthrough

Adds Apache Arrow as a selectable contrib build target with a new OPENMS_CONTRIB_BUILD_ARROW macro, migrates contrib archive checksum verification from SHA1 to SHA256, updates CI ARM runner/prerequisites and parallel-build handling, and applies small comment and Docker/CMake path adjustments across library build scripts.

Changes

Cohort / File(s) Summary
Build configuration
CMakeLists.txt
Adds ARROW to BUILD_TYPE/VALID_BUILD_TYPES; declares ARROW_DIR, ARCHIVE_ARROW, ARCHIVE_ARROW_TAR, ARCHIVE_ARROW_SHA256, INCLUDE_DIR_ARROW; includes libraries.cmake/arrow.cmake; integrates ARROW clean/build/copy steps into the main contrib build loop.
Arrow build automation
libraries.cmake/arrow.cmake
Adds macro OPENMS_CONTRIB_BUILD_ARROW implementing platform-specific configure/build/install flows (MSVC and Linux/macOS) with logging and FATAL_ERROR on failures.
Archive validation & download macros
macros.cmake
Switches archive validation/download from SHA1 to SHA256 (_target_SHA256 / _downloaded_SHA256), updates empty-archive sentinel checks and mismatch messaging, and adjusts base download URL path.
CI workflow & prerequisites
.github/workflows/main.yml
Updates ARM runner to ubuntu-24.04-arm and generalizes ARM detection; installs libthrift-dev (Linux) and thrift (macOS); splits contrib build into OS-specific steps and computes/exports CMAKE_BUILD_PARALLEL_LEVEL for parallel builds.
Parallel builds & Docker
libraries.cmake/boost.cmake, libraries.cmake/coinor.cmake, dockerfiles/pyopenms/*
Boost: read CMAKE_BUILD_PARALLEL_LEVEL (with fallback) for b2 -j; Coinor: removed explicit -j flags; Dockerfiles: invoke cmake against /contrib and export CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) for parallel builds.
Minor edits & comments
README.md, libraries.cmake/bzip2.cmake, libraries.cmake/glpk.cmake, libraries.cmake/xercesc.cmake, libraries.cmake/zlib.cmake
README link updated; small comment corrections; removed XERCES_EXTRA_CMAKE_FLAGS and ZLIB_EXTRA_CMAKE_FLAG insertions on MSVC; zlib Linux/macOS path adds -D CMAKE_C_FLAGS=${ZLIB_CFLAGS}.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CI as CI runner
    participant CMake as CMake orchestration
    participant Remote as Archive server
    participant Arrow as arrow.cmake (OPENMS_CONTRIB_BUILD_ARROW)
    participant FS as Filesystem / Artifacts

    CI->>CMake: invoke contrib build (BUILD_TYPE includes ARROW)
    CMake->>Remote: validate/download archives (verify SHA256)
    Remote-->>CMake: archive.tar.gz
    CMake->>FS: extract archive -> ARROW_DIR
    CMake->>Arrow: call OPENMS_CONTRIB_BUILD_ARROW
    Arrow->>Arrow: configure (platform-specific CMake args, write log)
    Arrow->>Arrow: build & install (log output, check result)
    Arrow->>FS: copy built libs/includes to contrib targets
    CMake->>CI: return success/failure (logs)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus review on:
    • libraries.cmake/arrow.cmake — platform-specific CMake args, logging, RESULT_VARIABLE handling, and FATAL_ERROR messages.
    • CMakeLists.txt — ARROW variable declarations and integration into the build orchestration.
    • macros.cmake — correctness of SHA256 substitution, empty-archive sentinel handling, and download URL change.
    • .github/workflows/main.yml and Dockerfiles — ARM label changes and propagation of CMAKE_BUILD_PARALLEL_LEVEL.

Possibly related PRs

Poem

🐇
I hopped through CMake fields at dawn,
Brought Arrow crates before the lawn,
SHA256 snug in my paw,
Logs that hum and builds that draw,
A rabbit cheers — the artifacts spawn. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: adding Arrow library support to the contrib build system and migrating archive verification from SHA1 to SHA256.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add_arrow

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32e568a and bc88ced.

📒 Files selected for processing (1)
  • .github/workflows/main.yml (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (ubuntu-24.04-arm)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (5)
.github/workflows/main.yml (5)

33-34: ✅ Solid upgrade to ubuntu-24.04-arm.

Updating the ARM runner from 22.04 to 24.04 keeps the CI infrastructure current and is a low-risk change. Good practice.


47-56: ✅ CMake setup logic is clean and maintainable.

Using endsWith(matrix.os, '-arm') is a robust pattern that generalizes across future ARM variants without code changes. The conditional logic correctly gates setup for non-ARM runners and manual install for ARM runners.


58-72: ✅ Prerequisite additions align with Arrow support.

The additions of libthrift-dev (Linux) and thrift (macOS) appear consistent with Apache Arrow build requirements mentioned in the PR objective. Package names are correct for their respective package managers, and platform-specific handling is appropriate.


87-99: ✅ Windows build step fixes both previous critical issues.

Issue #1 (CPU detection): Now correctly uses PowerShell (shell: pwsh) to access $env:NUMBER_OF_PROCESSORS instead of relying on bash environment variable visibility. This was flagged in the prior review as a critical blocker.

Issue #2 (CMakeCache collision): Separate contrib-build and contrib-build-openmp directories prevent CMakeCache.txt reuse between the ALL and OPENMP builds, enabling clean reconfiguration without state pollution.

Architecture is sound: NPROC detection → CMAKE_BUILD_PARALLEL_LEVEL export → two sequential cmake runs in isolated directories.


101-116: ✅ Unix build step is well-structured with solid platform handling.

  • Platform-specific NPROC detection is correct: sysctl -n hw.ncpu (macOS) and nproc (Linux), both with safe fallbacks.
  • Mirrors Windows fix: separate build directories (contrib-build and contrib-build-openmp) prevent CMakeCache collisions.
  • Proper variable scoping: exports CMAKE_BUILD_PARALLEL_LEVEL and uses defensive fallback ${NPROC:-4}.
  • Bash quoting and path handling are sound.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a63c22f and 4728615.

📒 Files selected for processing (3)
  • CMakeLists.txt (5 hunks)
  • libraries.cmake/arrow.cmake (1 hunks)
  • macros.cmake (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-22.04-arm)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (4)
macros.cmake (1)

114-120: LGTM!

The SHA256 migration in download_contrib_archive is correctly implemented. The empty file SHA256 hash (e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) is correct, and variable naming is consistent.

CMakeLists.txt (2)

624-631: LGTM!

The Arrow build section correctly follows the established pattern used by other libraries (COINOR, EIGEN, HDF5, etc.) with OPENMS_CLEAN_LIB, OPENMS_CONTRIB_BUILD_ARROW, and OPENMS_COPY_LIBS.


217-257: LGTM!

The SHA256 hash migration for all existing libraries is correctly implemented. The hash values are the expected 64-character hexadecimal strings consistent with SHA256 format.

libraries.cmake/arrow.cmake (1)

18-31: ARROW_EXTRA_CMAKE_FLAG is intentionally undefined and follows the established pattern.

This variable is used at line 26 but left undefined in CMakeLists.txt. This is an intentional design pattern used consistently across multiple libraries in the codebase (zlib.cmake, xercesc.cmake), allowing users to pass optional CMake flags without modifying the source files. When undefined, it safely expands to an empty string with no functional impact.

Comment thread CMakeLists.txt Outdated
Comment thread libraries.cmake/arrow.cmake Outdated
Comment thread libraries.cmake/arrow.cmake
Comment thread macros.cmake
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22c2300 and 56fffd5.

📒 Files selected for processing (1)
  • CMakeLists.txt (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-22.04-arm)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (5)
CMakeLists.txt (5)

94-95: ADD Arrow to build type lists.

Arrow is correctly added to both BUILD_TYPE and VALID_BUILD_TYPES, making it selectable for compilation.


271-271: Verify INCLUDE_DIR_ARROW target location.

INCLUDE_DIR_ARROW is set to ${CONTRIB_BIN_INCLUDE_DIR}/arrow. Confirm that the Arrow build macro (OPENMS_CONTRIB_BUILD_ARROW in libraries.cmake/arrow.cmake) installs headers to this location during the build process.


485-485: Arrow CMake macros correctly included.

The Arrow build macro file is properly included alongside other library macros.


624-631: Arrow build section follows established pattern.

The Arrow build logic is consistent with other optional libraries (HDF5, KISSFFT, OPENMP): clean, build, and copy libraries. The conditional check (if (BUILD_TYPE STREQUAL "ALL" OR "ARROW" IN_LIST BUILD_TYPE)) properly integrates Arrow as a selectable build target.


199-199: Verify Arrow archive extracts to the expected directory structure.

ARROW_DIR references arrow-cpp-22.0.0/cpp, and the archive is arrow-cpp-22.0.0.tar.xz. While the version numbers now match (fixing the prior mismatch), you must confirm that extracting this .tar.xz produces a cpp/ subdirectory at the expected depth. If the archive root differs from arrow-cpp-22.0.0/, or if the C++ sources are not in a cpp/ subdirectory, the build will fail.

You can verify the archive structure by extracting it locally or by checking the Apache Arrow GitHub repository for version 22.0.0 and confirming the directory layout matches.

#!/bin/bash
# Verify Arrow archive structure (if you have access to the archive locally)
# Expected: arrow-cpp-22.0.0/cpp directory should exist after extraction

archive_file="arrow-cpp-22.0.0.tar.xz"
if [ -f "$archive_file" ]; then
  tar -tzf "$archive_file" | grep -E '^arrow-cpp-22.0.0/cpp/' | head -5
  echo "---"
  tar -tzf "$archive_file" | head -10
fi

Alternatively, check the upstream repository on GitHub to confirm the C++ source location within the archive.

Also applies to: 259-261

Comment thread CMakeLists.txt Outdated
Comment thread libraries.cmake/arrow.cmake Outdated
-D CMAKE_INSTALL_PREFIX=${PROJECT_BINARY_DIR}
-D ARROW_CSV=ON
-D ARROW_PARQUET=ON
${ARROW_EXTRA_CMAKE_FLAG}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could try to make it recognize the compression libs that we are building in our contrib like we do for boost.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
libraries.cmake/arrow.cmake (1)

89-90: Copy/paste error in comment.

The comment incorrectly references "libsvm compiler" but this is the Arrow build macro.

-  # CFLAGS for libsvm compiler (see libsvm Makefile)
+  # CFLAGS for Arrow build
   set(ARROW_CFLAGS "-Wall -O3 -fPIC")
🧹 Nitpick comments (1)
libraries.cmake/arrow.cmake (1)

98-98: Missing CMAKE_CXX_FLAGS for Arrow C++ sources.

Arrow is primarily a C++ library, but only CMAKE_C_FLAGS is being set. The -fPIC flag is essential for shared library compatibility and should also apply to C++ compilation.

                          -D CMAKE_C_FLAGS=${ARROW_CFLAGS}
+                         -D CMAKE_CXX_FLAGS=${ARROW_CFLAGS}
                          -D ARROW_CSV=ON
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a01a7aa and 08a054d.

📒 Files selected for processing (2)
  • CMakeLists.txt (6 hunks)
  • libraries.cmake/arrow.cmake (1 hunks)
🔇 Additional comments (7)
libraries.cmake/arrow.cmake (2)

16-76: LGTM for MSVC build path.

The MSVC build path has proper error handling with configure success checks, logs both stdout and stderr, and correctly builds both Debug and Release configurations.


111-118: LGTM for configure success check.

The configure success check was properly added per the previous review feedback, with detailed error output before the fatal error to aid debugging.

CMakeLists.txt (5)

94-95: LGTM for ARROW addition to build types.

ARROW is properly added to both the description string and the VALID_BUILD_TYPES list.


199-199: LGTM for ARROW_DIR definition.

The directory path arrow-apache-arrow-22.0.0/cpp correctly matches the archive name and points to the cpp subdirectory where Arrow's CMakeLists.txt resides.


260-262: LGTM for ARROW archive metadata.

Archive name, tar name, and SHA256 hash are properly defined and consistent with the ARROW_DIR path.


625-632: LGTM for ARROW build block.

The ARROW build block follows the established pattern: clean, build, and copy libs. Including ARROW in the "ALL" build type is appropriate for a core dependency.


218-258: SHA1 to SHA256 migration looks good.

All library archives now use SHA256 hashes (64-character hex strings) instead of SHA1. This improves integrity verification.

Comment thread CMakeLists.txt
Comment thread libraries.cmake/arrow.cmake Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cea7770 and 4900c9f.

📒 Files selected for processing (1)
  • libraries.cmake/arrow.cmake (1 hunks)
🔇 Additional comments (3)
libraries.cmake/arrow.cmake (3)

1-13: LGTM!

The macro setup and archive extraction follow the established patterns used by other library build scripts in this project.


16-44: LGTM!

The MSVC configure step properly captures output, logs it, and validates success before proceeding to the build phase.


46-79: LGTM!

Both Debug and Release build steps follow consistent error handling with proper logging and failure detection.

Comment thread libraries.cmake/arrow.cmake Outdated
Comment thread libraries.cmake/arrow.cmake
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
libraries.cmake/arrow.cmake (1)

92-112: CMAKE_CXX_FLAGS still missing for Arrow C++ compilation.

A previous review correctly identified that Arrow is a C++ library but only CMAKE_C_FLAGS is being set (line 102). Without CMAKE_CXX_FLAGS, the C++ compiler won't receive the -Wall -O3 -fPIC flags, which can affect build warnings, optimization, and shared library generation.

Apply this diff to add CMAKE_CXX_FLAGS:

                           -D CMAKE_INSTALL_PREFIX=${PROJECT_BINARY_DIR}
                           -D CMAKE_PREFIX_PATH=${PROJECT_BINARY_DIR}
                           -D CMAKE_C_FLAGS=${ARROW_CFLAGS}
+                          -D CMAKE_CXX_FLAGS=${ARROW_CFLAGS}
                           -D ARROW_CSV=ON

Additionally, consider quoting the flag values for robustness:

-                          -D CMAKE_C_FLAGS=${ARROW_CFLAGS}
-                          -D CMAKE_CXX_FLAGS=${ARROW_CFLAGS}
+                          -D CMAKE_C_FLAGS="${ARROW_CFLAGS}"
+                          -D CMAKE_CXX_FLAGS="${ARROW_CFLAGS}"
🧹 Nitpick comments (1)
libraries.cmake/arrow.cmake (1)

103-107: Consider linking Arrow to contrib-built compression libraries.

Arrow is configured to use zlib, bzip2, and zstd (lines 105-107), but there's no explicit linkage to the contrib-built versions of these libraries. As noted in previous discussion, other libraries like Boost explicitly configure paths to contrib-built dependencies.

To ensure Arrow uses the contrib-built compression libraries, consider adding:

                           -D CMAKE_PREFIX_PATH=${PROJECT_BINARY_DIR}
                           -D CMAKE_C_FLAGS="${ARROW_CFLAGS}"
                           -D CMAKE_CXX_FLAGS="${ARROW_CFLAGS}"
+                          -D ZLIB_ROOT=${PROJECT_BINARY_DIR}
+                          -D BZip2_ROOT=${PROJECT_BINARY_DIR}
+                          -D zstd_ROOT=${PROJECT_BINARY_DIR}
                           -D ARROW_CSV=ON

This approach mirrors the pattern used by other libraries in the contrib system and ensures consistent dependency management.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3df30b0 and e20c134.

📒 Files selected for processing (1)
  • libraries.cmake/arrow.cmake (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-24.04-arm)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (1)
libraries.cmake/arrow.cmake (1)

127-135: No action needed—target names are correct per platform-specific CMake conventions.

The lowercase install target on Linux/macOS (line 131) and uppercase INSTALL target on MSVC (lines 47, 65) are the correct, intentional conventions. CMake's Makefile generator (Unix) uses the lowercase install target, while the Visual Studio generator (MSVC) uses uppercase INSTALL. This is standard CMake behavior and requires no changes.

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
libraries.cmake/boost.cmake (1)

177-183: Env-based parallelism looks good; consider guarding unset CMAKE_BUILD_PARALLEL_LEVEL.

Switching Boost’s b2 parallelism to CMAKE_BUILD_PARALLEL_LEVEL is consistent with the rest of the contrib changes and should work fine when the env var is set. If it is missing, though, these lines will pass a bare -j to b2, relying on its default behavior.

If you want slightly more robust behavior for ad‑hoc/local builds, you could optionally gate the flag, e.g.:

  • Compute a helper var only when the env var is defined (e.g. BOOST_PARALLEL_ARGS with -j and the value), and
  • Use that in the execute_process(COMMAND ./b2 ...) call instead of always injecting -j.

This would avoid surprising behavior when CMAKE_BUILD_PARALLEL_LEVEL is unset, while preserving your intended behavior in CI.

libraries.cmake/arrow.cmake (1)

38-46: Align MSVC error reporting with Linux/macOS for easier debugging

On the MSVC path, failures in the configure and build steps only emit a generic FATAL_ERROR message, even though stdout/stderr are captured and appended to ${LOGFILE}. The non‑MSVC path additionally prints Output: / Error: and points to the log file, which is much more helpful when diagnosing CI or local build issues.

Consider mirroring that pattern in the MSVC branches, e.g.:

   if(NOT ARROW_CMAKE_SUCCESS EQUAL 0)
-    message(FATAL_ERROR "Generating arrow build system .. failed")
+    message(STATUS "Output: ${ARROW_CMAKE_OUT}")
+    message(STATUS "Error: ${ARROW_CMAKE_ERR}")
+    message(FATAL_ERROR "Generating arrow build system .. failed. Check the log file for details: ${LOGFILE}")
   else()

   ...

   if(NOT ARROW_BUILD_SUCCESS EQUAL 0)
-    message(FATAL_ERROR "Building arrow lib (Debug) .. failed")
+    message(STATUS "Output: ${ARROW_BUILD_OUT}")
+    message(STATUS "Error: ${ARROW_BUILD_ERR}")
+    message(FATAL_ERROR "Building arrow lib (Debug) .. failed. Check the log file for details: ${LOGFILE}")
   else()

   ...

   if(NOT ARROW_BUILD_SUCCESS EQUAL 0)
-    message(FATAL_ERROR "Building arrow lib (Release) .. failed")
+    message(STATUS "Output: ${ARROW_BUILD_OUT}")
+    message(STATUS "Error: ${ARROW_BUILD_ERR}")
+    message(FATAL_ERROR "Building arrow lib (Release) .. failed. Check the log file for details: ${LOGFILE}")
   else()

This keeps behavior consistent across platforms and surfaces the captured diagnostics when something goes wrong.

Also applies to: 55-63, 76-80

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e20c134 and b382dbc.

📒 Files selected for processing (5)
  • dockerfiles/pyopenms/manylinux/ARM64_Dockerfile (1 hunks)
  • dockerfiles/pyopenms/manylinux/Dockerfile (1 hunks)
  • libraries.cmake/arrow.cmake (1 hunks)
  • libraries.cmake/boost.cmake (1 hunks)
  • libraries.cmake/coinor.cmake (0 hunks)
💤 Files with no reviewable changes (1)
  • libraries.cmake/coinor.cmake
🚧 Files skipped from review as they are similar to previous changes (2)
  • dockerfiles/pyopenms/manylinux/Dockerfile
  • dockerfiles/pyopenms/manylinux/ARM64_Dockerfile

Comment thread libraries.cmake/arrow.cmake Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
libraries.cmake/boost.cmake (1)

176-183: Consider adding validation and documentation for the parallel jobs configuration.

The environment variable handling logic is correct, but consider these optional improvements:

  1. Numeric validation: The code doesn't verify that CMAKE_BUILD_PARALLEL_LEVEL contains a valid integer. Invalid values could cause cryptic build failures.
  2. Bounds checking: Very large values could exhaust system resources.
  3. Documentation: Add a comment explaining the purpose of CMAKE_BUILD_PARALLEL_LEVEL.

Example enhancement:

+# Read parallel job count from CMAKE_BUILD_PARALLEL_LEVEL environment variable (used by CMake's --parallel flag)
+# Falls back to 2 if not set or invalid
 if(DEFINED ENV{CMAKE_BUILD_PARALLEL_LEVEL} AND NOT "$ENV{CMAKE_BUILD_PARALLEL_LEVEL}" STREQUAL "")
-  set(_BOOST_PARALLEL_JOBS $ENV{CMAKE_BUILD_PARALLEL_LEVEL})
+  # Validate that the value is a positive integer
+  if("$ENV{CMAKE_BUILD_PARALLEL_LEVEL}" MATCHES "^[0-9]+$")
+    set(_BOOST_PARALLEL_JOBS $ENV{CMAKE_BUILD_PARALLEL_LEVEL})
+  else()
+    message(WARNING "CMAKE_BUILD_PARALLEL_LEVEL='$ENV{CMAKE_BUILD_PARALLEL_LEVEL}' is not a valid integer, using default of 2")
+    set(_BOOST_PARALLEL_JOBS 2)
+  endif()
 else()
   set(_BOOST_PARALLEL_JOBS 2)
 endif()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b382dbc and c846eed.

📒 Files selected for processing (3)
  • .github/workflows/main.yml (4 hunks)
  • libraries.cmake/arrow.cmake (1 hunks)
  • libraries.cmake/boost.cmake (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/main.yml
  • libraries.cmake/arrow.cmake

Comment thread libraries.cmake/boost.cmake
poshul and others added 3 commits December 2, 2025 21:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 278a705 and b26db63.

📒 Files selected for processing (1)
  • .github/workflows/main.yml (4 hunks)
🔇 Additional comments (4)
.github/workflows/main.yml (4)

33-34: Approved: Ubuntu 24.04 ARM runner upgrade.

The upgrade from ubuntu-latest to ubuntu-24.04-arm aligns with the cross-platform build strategy and provides better long-term stability for ARM builds.


47-47: Approved: Conditional CMake setup for ARM runners.

The logic correctly skips the setup action on ARM runners and installs CMake from apt instead. The endsWith('-arm') check is an appropriate platform discriminator.

Also applies to: 52-56


62-62: Approved: Thrift prerequisites for Arrow support.

The addition of libthrift-dev (Linux) and thrift (macOS) aligns with the Arrow integration mentioned in the PR objectives. These are appropriate prerequisite updates.

Also applies to: 67-67


90-99: Approved: Cross-platform path and parallelism handling.

The dynamic NPROC calculation and workspace path normalization (especially the Windows backslash-to-forward-slash conversion for CMake compatibility) are well-implemented. The approach correctly accounts for platform differences while maintaining consistency with CMake expectations.

Comment thread .github/workflows/main.yml
Comment thread .github/workflows/main.yml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.github/workflows/main.yml (1)

90-100: ⚠️ Windows CPU detection still doesn't work in bash—regression from previous review.

Line 91 reads NPROC=$NUMBER_OF_PROCESSORS directly in bash, but NUMBER_OF_PROCESSORS is a Windows environment variable invisible to Git Bash/msys2. The fallback to 4 (line 100) masks the real CPU count. A prior review flagged this critical issue and noted it was "Addressed in commit 50580f3," but the code still contains the bug.

Query NUMBER_OF_PROCESSORS from the Windows environment before using it in bash:

  if [ "$RUNNER_OS" == "Windows" ]; then
-   NPROC=$NUMBER_OF_PROCESSORS
+   NPROC=$(cmd.exe /C "echo %NUMBER_OF_PROCESSORS%" 2>/dev/null)
+   NPROC=${NPROC:-4}
    WORKSPACE="${GITHUB_WORKSPACE//\\//}"

(Alternatively, use powershell -Command '$env:NUMBER_OF_PROCESSORS' or wmic os get processorcount | grep -oE '[0-9]+' | head -1 if cmd.exe is unavailable.)

🧹 Nitpick comments (1)
libraries.cmake/boost.cmake (1)

21-26: Env-driven _BOOST_PARALLEL_JOBS setup looks solid; optional: log fallback value

Reading CMAKE_BUILD_PARALLEL_LEVEL from the environment and falling back to 2 is simple and consistent with the PR’s goal of centralizing parallelism. If you expect the fallback path to be hit often, consider adding a short message(STATUS "Using default Boost parallel jobs: ${_BOOST_PARALLEL_JOBS}") when the env var is unset, purely to make debugging build performance easier.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b26db63 and 32e568a.

📒 Files selected for processing (2)
  • .github/workflows/main.yml (4 hunks)
  • libraries.cmake/boost.cmake (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-24.04-arm)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (7)
libraries.cmake/boost.cmake (2)

59-61: Good: Windows/MSVC b2 now respects _BOOST_PARALLEL_JOBS

Including "-j" "${_BOOST_PARALLEL_JOBS}" in BOOST_CMD_ARGS for the MSVC path brings Windows in line with Unix/macOS and addresses the previous inconsistency noted in earlier reviews. No functional issues spotted with the argument ordering.


185-191: Consistent parallelism on Unix/macOS via _BOOST_PARALLEL_JOBS

Using _BOOST_PARALLEL_JOBS both in the BOOST_CMD string and in the actual execute_process call keeps the logged command and the real invocation aligned and ties Boost’s parallelism cleanly to CMAKE_BUILD_PARALLEL_LEVEL. Argument placement around toolset, -j, and the remaining flags looks correct.

.github/workflows/main.yml (5)

33-33: Good: ARM platform detection improvements generalize cross-version support.

Switching to ubuntu-24.04-arm and using endsWith(matrix.os, '-arm') instead of hardcoding ubuntu-22.04-arm allows future arm-based runners to work without additional workflow changes.

Also applies to: 47-47, 53-53


62-62: Good: Adding thrift dependencies for Arrow/Thrift support.

The libthrift-dev (Linux) and thrift (macOS) additions align with the PR goal to add Arrow as a build target.

Also applies to: 67-67


93-100: Good: Platform-specific NPROC detection for macOS and Linux.

sysctl -n hw.ncpu and nproc are the correct standard tools for each platform, with a safe fallback to 4.


104-106: Good: Separate build directories eliminate CMakeCache collision.

Using contrib-build and contrib-build-openmp as distinct directories ensures the second cmake invocation with BUILD_TYPE=OPENMP has a clean CMakeCache, fixing the reconfiguration issue flagged in the previous review.


103-106: Builds are triggered during the cmake configure phase via execute_process()—no separate build step is missing.

The cmake invocations at lines 103-106 do trigger builds as intended. Each library's build macro (e.g., OPENMS_CONTRIB_BUILD_OPENMP(), OPENMS_CONTRIB_BUILD_BOOST()) explicitly calls execute_process() with cmake --build or equivalent bootstrap commands during the configure phase. The pattern is consistent across all libraries in libraries.cmake/. This is the intentional design, not a missing step.

Note: The OPENMP build (line 106) only executes on macOS; on other platforms it is skipped by design (see openmp.cmake line 65).

Likely an incorrect or invalid review comment.

@poshul poshul merged commit e54dffd into master Dec 3, 2025
6 checks passed
This was referenced Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants