Conversation
WalkthroughAdds Apache Arrow as a selectable contrib build target with a new OPENMS_CONTRIB_BUILD_ARROW macro, migrates contrib archive checksum verification from SHA1 to SHA256, updates CI ARM runner/prerequisites and parallel-build handling, and applies small comment and Docker/CMake path adjustments across library build scripts. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CI as CI runner
participant CMake as CMake orchestration
participant Remote as Archive server
participant Arrow as arrow.cmake (OPENMS_CONTRIB_BUILD_ARROW)
participant FS as Filesystem / Artifacts
CI->>CMake: invoke contrib build (BUILD_TYPE includes ARROW)
CMake->>Remote: validate/download archives (verify SHA256)
Remote-->>CMake: archive.tar.gz
CMake->>FS: extract archive -> ARROW_DIR
CMake->>Arrow: call OPENMS_CONTRIB_BUILD_ARROW
Arrow->>Arrow: configure (platform-specific CMake args, write log)
Arrow->>Arrow: build & install (log output, check result)
Arrow->>FS: copy built libs/includes to contrib targets
CMake->>CI: return success/failure (logs)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CMakeLists.txt(5 hunks)libraries.cmake/arrow.cmake(1 hunks)macros.cmake(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (windows-latest)
- GitHub Check: build (ubuntu-22.04-arm)
- GitHub Check: build (ubuntu-latest)
🔇 Additional comments (4)
macros.cmake (1)
114-120: LGTM!The SHA256 migration in
download_contrib_archiveis correctly implemented. The empty file SHA256 hash (e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) is correct, and variable naming is consistent.CMakeLists.txt (2)
624-631: LGTM!The Arrow build section correctly follows the established pattern used by other libraries (COINOR, EIGEN, HDF5, etc.) with
OPENMS_CLEAN_LIB,OPENMS_CONTRIB_BUILD_ARROW, andOPENMS_COPY_LIBS.
217-257: LGTM!The SHA256 hash migration for all existing libraries is correctly implemented. The hash values are the expected 64-character hexadecimal strings consistent with SHA256 format.
libraries.cmake/arrow.cmake (1)
18-31:ARROW_EXTRA_CMAKE_FLAGis intentionally undefined and follows the established pattern.This variable is used at line 26 but left undefined in CMakeLists.txt. This is an intentional design pattern used consistently across multiple libraries in the codebase (zlib.cmake, xercesc.cmake), allowing users to pass optional CMake flags without modifying the source files. When undefined, it safely expands to an empty string with no functional impact.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CMakeLists.txt(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (windows-latest)
- GitHub Check: build (ubuntu-22.04-arm)
- GitHub Check: build (ubuntu-latest)
🔇 Additional comments (5)
CMakeLists.txt (5)
94-95: ADD Arrow to build type lists.Arrow is correctly added to both
BUILD_TYPEandVALID_BUILD_TYPES, making it selectable for compilation.
271-271: Verify INCLUDE_DIR_ARROW target location.
INCLUDE_DIR_ARROWis set to${CONTRIB_BIN_INCLUDE_DIR}/arrow. Confirm that the Arrow build macro (OPENMS_CONTRIB_BUILD_ARROWin libraries.cmake/arrow.cmake) installs headers to this location during the build process.
485-485: Arrow CMake macros correctly included.The Arrow build macro file is properly included alongside other library macros.
624-631: Arrow build section follows established pattern.The Arrow build logic is consistent with other optional libraries (HDF5, KISSFFT, OPENMP): clean, build, and copy libraries. The conditional check (
if (BUILD_TYPE STREQUAL "ALL" OR "ARROW" IN_LIST BUILD_TYPE)) properly integrates Arrow as a selectable build target.
199-199: Verify Arrow archive extracts to the expected directory structure.
ARROW_DIRreferencesarrow-cpp-22.0.0/cpp, and the archive isarrow-cpp-22.0.0.tar.xz. While the version numbers now match (fixing the prior mismatch), you must confirm that extracting this.tar.xzproduces acpp/subdirectory at the expected depth. If the archive root differs fromarrow-cpp-22.0.0/, or if the C++ sources are not in acpp/subdirectory, the build will fail.You can verify the archive structure by extracting it locally or by checking the Apache Arrow GitHub repository for version 22.0.0 and confirming the directory layout matches.
#!/bin/bash # Verify Arrow archive structure (if you have access to the archive locally) # Expected: arrow-cpp-22.0.0/cpp directory should exist after extraction archive_file="arrow-cpp-22.0.0.tar.xz" if [ -f "$archive_file" ]; then tar -tzf "$archive_file" | grep -E '^arrow-cpp-22.0.0/cpp/' | head -5 echo "---" tar -tzf "$archive_file" | head -10 fiAlternatively, check the upstream repository on GitHub to confirm the C++ source location within the archive.
Also applies to: 259-261
| -D CMAKE_INSTALL_PREFIX=${PROJECT_BINARY_DIR} | ||
| -D ARROW_CSV=ON | ||
| -D ARROW_PARQUET=ON | ||
| ${ARROW_EXTRA_CMAKE_FLAG} |
There was a problem hiding this comment.
One could try to make it recognize the compression libs that we are building in our contrib like we do for boost.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
libraries.cmake/arrow.cmake (1)
89-90: Copy/paste error in comment.The comment incorrectly references "libsvm compiler" but this is the Arrow build macro.
- # CFLAGS for libsvm compiler (see libsvm Makefile) + # CFLAGS for Arrow build set(ARROW_CFLAGS "-Wall -O3 -fPIC")
🧹 Nitpick comments (1)
libraries.cmake/arrow.cmake (1)
98-98: MissingCMAKE_CXX_FLAGSfor Arrow C++ sources.Arrow is primarily a C++ library, but only
CMAKE_C_FLAGSis being set. The-fPICflag is essential for shared library compatibility and should also apply to C++ compilation.-D CMAKE_C_FLAGS=${ARROW_CFLAGS} + -D CMAKE_CXX_FLAGS=${ARROW_CFLAGS} -D ARROW_CSV=ON
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CMakeLists.txt(6 hunks)libraries.cmake/arrow.cmake(1 hunks)
🔇 Additional comments (7)
libraries.cmake/arrow.cmake (2)
16-76: LGTM for MSVC build path.The MSVC build path has proper error handling with configure success checks, logs both stdout and stderr, and correctly builds both Debug and Release configurations.
111-118: LGTM for configure success check.The configure success check was properly added per the previous review feedback, with detailed error output before the fatal error to aid debugging.
CMakeLists.txt (5)
94-95: LGTM for ARROW addition to build types.ARROW is properly added to both the description string and the
VALID_BUILD_TYPESlist.
199-199: LGTM for ARROW_DIR definition.The directory path
arrow-apache-arrow-22.0.0/cppcorrectly matches the archive name and points to the cpp subdirectory where Arrow's CMakeLists.txt resides.
260-262: LGTM for ARROW archive metadata.Archive name, tar name, and SHA256 hash are properly defined and consistent with the ARROW_DIR path.
625-632: LGTM for ARROW build block.The ARROW build block follows the established pattern: clean, build, and copy libs. Including ARROW in the "ALL" build type is appropriate for a core dependency.
218-258: SHA1 to SHA256 migration looks good.All library archives now use SHA256 hashes (64-character hex strings) instead of SHA1. This improves integrity verification.
Print build logs on error
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
libraries.cmake/arrow.cmake(1 hunks)
🔇 Additional comments (3)
libraries.cmake/arrow.cmake (3)
1-13: LGTM!The macro setup and archive extraction follow the established patterns used by other library build scripts in this project.
16-44: LGTM!The MSVC configure step properly captures output, logs it, and validates success before proceeding to the build phase.
46-79: LGTM!Both Debug and Release build steps follow consistent error handling with proper logging and failure detection.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
libraries.cmake/arrow.cmake (1)
92-112: CMAKE_CXX_FLAGS still missing for Arrow C++ compilation.A previous review correctly identified that Arrow is a C++ library but only
CMAKE_C_FLAGSis being set (line 102). WithoutCMAKE_CXX_FLAGS, the C++ compiler won't receive the-Wall -O3 -fPICflags, which can affect build warnings, optimization, and shared library generation.Apply this diff to add CMAKE_CXX_FLAGS:
-D CMAKE_INSTALL_PREFIX=${PROJECT_BINARY_DIR} -D CMAKE_PREFIX_PATH=${PROJECT_BINARY_DIR} -D CMAKE_C_FLAGS=${ARROW_CFLAGS} + -D CMAKE_CXX_FLAGS=${ARROW_CFLAGS} -D ARROW_CSV=ONAdditionally, consider quoting the flag values for robustness:
- -D CMAKE_C_FLAGS=${ARROW_CFLAGS} - -D CMAKE_CXX_FLAGS=${ARROW_CFLAGS} + -D CMAKE_C_FLAGS="${ARROW_CFLAGS}" + -D CMAKE_CXX_FLAGS="${ARROW_CFLAGS}"
🧹 Nitpick comments (1)
libraries.cmake/arrow.cmake (1)
103-107: Consider linking Arrow to contrib-built compression libraries.Arrow is configured to use zlib, bzip2, and zstd (lines 105-107), but there's no explicit linkage to the contrib-built versions of these libraries. As noted in previous discussion, other libraries like Boost explicitly configure paths to contrib-built dependencies.
To ensure Arrow uses the contrib-built compression libraries, consider adding:
-D CMAKE_PREFIX_PATH=${PROJECT_BINARY_DIR} -D CMAKE_C_FLAGS="${ARROW_CFLAGS}" -D CMAKE_CXX_FLAGS="${ARROW_CFLAGS}" + -D ZLIB_ROOT=${PROJECT_BINARY_DIR} + -D BZip2_ROOT=${PROJECT_BINARY_DIR} + -D zstd_ROOT=${PROJECT_BINARY_DIR} -D ARROW_CSV=ONThis approach mirrors the pattern used by other libraries in the contrib system and ensures consistent dependency management.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
libraries.cmake/arrow.cmake(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (windows-latest)
- GitHub Check: build (ubuntu-24.04-arm)
- GitHub Check: build (ubuntu-latest)
🔇 Additional comments (1)
libraries.cmake/arrow.cmake (1)
127-135: No action needed—target names are correct per platform-specific CMake conventions.The lowercase
installtarget on Linux/macOS (line 131) and uppercaseINSTALLtarget on MSVC (lines 47, 65) are the correct, intentional conventions. CMake's Makefile generator (Unix) uses the lowercaseinstalltarget, while the Visual Studio generator (MSVC) uses uppercaseINSTALL. This is standard CMake behavior and requires no changes.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
libraries.cmake/boost.cmake (1)
177-183: Env-based parallelism looks good; consider guarding unsetCMAKE_BUILD_PARALLEL_LEVEL.Switching Boost’s
b2parallelism toCMAKE_BUILD_PARALLEL_LEVELis consistent with the rest of the contrib changes and should work fine when the env var is set. If it is missing, though, these lines will pass a bare-jtob2, relying on its default behavior.If you want slightly more robust behavior for ad‑hoc/local builds, you could optionally gate the flag, e.g.:
- Compute a helper var only when the env var is defined (e.g.
BOOST_PARALLEL_ARGSwith-jand the value), and- Use that in the
execute_process(COMMAND ./b2 ...)call instead of always injecting-j.This would avoid surprising behavior when
CMAKE_BUILD_PARALLEL_LEVELis unset, while preserving your intended behavior in CI.libraries.cmake/arrow.cmake (1)
38-46: Align MSVC error reporting with Linux/macOS for easier debuggingOn the MSVC path, failures in the configure and build steps only emit a generic
FATAL_ERRORmessage, even though stdout/stderr are captured and appended to${LOGFILE}. The non‑MSVC path additionally printsOutput:/Error:and points to the log file, which is much more helpful when diagnosing CI or local build issues.Consider mirroring that pattern in the MSVC branches, e.g.:
if(NOT ARROW_CMAKE_SUCCESS EQUAL 0) - message(FATAL_ERROR "Generating arrow build system .. failed") + message(STATUS "Output: ${ARROW_CMAKE_OUT}") + message(STATUS "Error: ${ARROW_CMAKE_ERR}") + message(FATAL_ERROR "Generating arrow build system .. failed. Check the log file for details: ${LOGFILE}") else() ... if(NOT ARROW_BUILD_SUCCESS EQUAL 0) - message(FATAL_ERROR "Building arrow lib (Debug) .. failed") + message(STATUS "Output: ${ARROW_BUILD_OUT}") + message(STATUS "Error: ${ARROW_BUILD_ERR}") + message(FATAL_ERROR "Building arrow lib (Debug) .. failed. Check the log file for details: ${LOGFILE}") else() ... if(NOT ARROW_BUILD_SUCCESS EQUAL 0) - message(FATAL_ERROR "Building arrow lib (Release) .. failed") + message(STATUS "Output: ${ARROW_BUILD_OUT}") + message(STATUS "Error: ${ARROW_BUILD_ERR}") + message(FATAL_ERROR "Building arrow lib (Release) .. failed. Check the log file for details: ${LOGFILE}") else()This keeps behavior consistent across platforms and surfaces the captured diagnostics when something goes wrong.
Also applies to: 55-63, 76-80
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
dockerfiles/pyopenms/manylinux/ARM64_Dockerfile(1 hunks)dockerfiles/pyopenms/manylinux/Dockerfile(1 hunks)libraries.cmake/arrow.cmake(1 hunks)libraries.cmake/boost.cmake(1 hunks)libraries.cmake/coinor.cmake(0 hunks)
💤 Files with no reviewable changes (1)
- libraries.cmake/coinor.cmake
🚧 Files skipped from review as they are similar to previous changes (2)
- dockerfiles/pyopenms/manylinux/Dockerfile
- dockerfiles/pyopenms/manylinux/ARM64_Dockerfile
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
libraries.cmake/boost.cmake (1)
176-183: Consider adding validation and documentation for the parallel jobs configuration.The environment variable handling logic is correct, but consider these optional improvements:
- Numeric validation: The code doesn't verify that
CMAKE_BUILD_PARALLEL_LEVELcontains a valid integer. Invalid values could cause cryptic build failures.- Bounds checking: Very large values could exhaust system resources.
- Documentation: Add a comment explaining the purpose of
CMAKE_BUILD_PARALLEL_LEVEL.Example enhancement:
+# Read parallel job count from CMAKE_BUILD_PARALLEL_LEVEL environment variable (used by CMake's --parallel flag) +# Falls back to 2 if not set or invalid if(DEFINED ENV{CMAKE_BUILD_PARALLEL_LEVEL} AND NOT "$ENV{CMAKE_BUILD_PARALLEL_LEVEL}" STREQUAL "") - set(_BOOST_PARALLEL_JOBS $ENV{CMAKE_BUILD_PARALLEL_LEVEL}) + # Validate that the value is a positive integer + if("$ENV{CMAKE_BUILD_PARALLEL_LEVEL}" MATCHES "^[0-9]+$") + set(_BOOST_PARALLEL_JOBS $ENV{CMAKE_BUILD_PARALLEL_LEVEL}) + else() + message(WARNING "CMAKE_BUILD_PARALLEL_LEVEL='$ENV{CMAKE_BUILD_PARALLEL_LEVEL}' is not a valid integer, using default of 2") + set(_BOOST_PARALLEL_JOBS 2) + endif() else() set(_BOOST_PARALLEL_JOBS 2) endif()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/main.yml(4 hunks)libraries.cmake/arrow.cmake(1 hunks)libraries.cmake/boost.cmake(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/main.yml
- libraries.cmake/arrow.cmake
Removed NUMBER_OF_JOBS setting and related messages.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/main.yml(4 hunks)
🔇 Additional comments (4)
.github/workflows/main.yml (4)
33-34: Approved: Ubuntu 24.04 ARM runner upgrade.The upgrade from ubuntu-latest to ubuntu-24.04-arm aligns with the cross-platform build strategy and provides better long-term stability for ARM builds.
47-47: Approved: Conditional CMake setup for ARM runners.The logic correctly skips the setup action on ARM runners and installs CMake from apt instead. The
endsWith('-arm')check is an appropriate platform discriminator.Also applies to: 52-56
62-62: Approved: Thrift prerequisites for Arrow support.The addition of libthrift-dev (Linux) and thrift (macOS) aligns with the Arrow integration mentioned in the PR objectives. These are appropriate prerequisite updates.
Also applies to: 67-67
90-99: Approved: Cross-platform path and parallelism handling.The dynamic NPROC calculation and workspace path normalization (especially the Windows backslash-to-forward-slash conversion for CMake compatibility) are well-implemented. The approach correctly accounts for platform differences while maintaining consistency with CMake expectations.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/main.yml (1)
90-100:⚠️ Windows CPU detection still doesn't work in bash—regression from previous review.Line 91 reads
NPROC=$NUMBER_OF_PROCESSORSdirectly in bash, butNUMBER_OF_PROCESSORSis a Windows environment variable invisible to Git Bash/msys2. The fallback to 4 (line 100) masks the real CPU count. A prior review flagged this critical issue and noted it was "Addressed in commit 50580f3," but the code still contains the bug.Query
NUMBER_OF_PROCESSORSfrom the Windows environment before using it in bash:if [ "$RUNNER_OS" == "Windows" ]; then - NPROC=$NUMBER_OF_PROCESSORS + NPROC=$(cmd.exe /C "echo %NUMBER_OF_PROCESSORS%" 2>/dev/null) + NPROC=${NPROC:-4} WORKSPACE="${GITHUB_WORKSPACE//\\//}"(Alternatively, use
powershell -Command '$env:NUMBER_OF_PROCESSORS'orwmic os get processorcount | grep -oE '[0-9]+' | head -1if cmd.exe is unavailable.)
🧹 Nitpick comments (1)
libraries.cmake/boost.cmake (1)
21-26: Env-driven_BOOST_PARALLEL_JOBSsetup looks solid; optional: log fallback valueReading
CMAKE_BUILD_PARALLEL_LEVELfrom the environment and falling back to2is simple and consistent with the PR’s goal of centralizing parallelism. If you expect the fallback path to be hit often, consider adding a shortmessage(STATUS "Using default Boost parallel jobs: ${_BOOST_PARALLEL_JOBS}")when the env var is unset, purely to make debugging build performance easier.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/main.yml(4 hunks)libraries.cmake/boost.cmake(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (windows-latest)
- GitHub Check: build (ubuntu-24.04-arm)
- GitHub Check: build (ubuntu-latest)
🔇 Additional comments (7)
libraries.cmake/boost.cmake (2)
59-61: Good: Windows/MSVC b2 now respects_BOOST_PARALLEL_JOBSIncluding
"-j" "${_BOOST_PARALLEL_JOBS}"inBOOST_CMD_ARGSfor the MSVC path brings Windows in line with Unix/macOS and addresses the previous inconsistency noted in earlier reviews. No functional issues spotted with the argument ordering.
185-191: Consistent parallelism on Unix/macOS via_BOOST_PARALLEL_JOBSUsing
_BOOST_PARALLEL_JOBSboth in theBOOST_CMDstring and in the actualexecute_processcall keeps the logged command and the real invocation aligned and ties Boost’s parallelism cleanly toCMAKE_BUILD_PARALLEL_LEVEL. Argument placement aroundtoolset,-j, and the remaining flags looks correct..github/workflows/main.yml (5)
33-33: Good: ARM platform detection improvements generalize cross-version support.Switching to ubuntu-24.04-arm and using
endsWith(matrix.os, '-arm')instead of hardcoding ubuntu-22.04-arm allows future arm-based runners to work without additional workflow changes.Also applies to: 47-47, 53-53
62-62: Good: Adding thrift dependencies for Arrow/Thrift support.The libthrift-dev (Linux) and thrift (macOS) additions align with the PR goal to add Arrow as a build target.
Also applies to: 67-67
93-100: Good: Platform-specific NPROC detection for macOS and Linux.
sysctl -n hw.ncpuandnprocare the correct standard tools for each platform, with a safe fallback to 4.
104-106: Good: Separate build directories eliminate CMakeCache collision.Using
contrib-buildandcontrib-build-openmpas distinct directories ensures the second cmake invocation withBUILD_TYPE=OPENMPhas a clean CMakeCache, fixing the reconfiguration issue flagged in the previous review.
103-106: Builds are triggered during the cmake configure phase via execute_process()—no separate build step is missing.The cmake invocations at lines 103-106 do trigger builds as intended. Each library's build macro (e.g.,
OPENMS_CONTRIB_BUILD_OPENMP(),OPENMS_CONTRIB_BUILD_BOOST()) explicitly callsexecute_process()withcmake --buildor equivalent bootstrap commands during the configure phase. The pattern is consistent across all libraries inlibraries.cmake/. This is the intentional design, not a missing step.Note: The
OPENMPbuild (line 106) only executes on macOS; on other platforms it is skipped by design (seeopenmp.cmakeline 65).Likely an incorrect or invalid review comment.
Summary by CodeRabbit
New Features
Chores
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.