Skip to content

Integrate mem_profile utility to bench.sh #16938

@ding-young

Description

@ding-young

Is your feature request related to a problem or challenge?

PR#16814 adds a new benchmark utility to retrieve memory statistics and print summary table.

We can run the binary directly with cargo run --profile release-nonlto --bin mem_profile -- --bench-profile release-nonlto tpch --path benchmarks/data/tpch_sf1 --partitions 4 --format parquet --query 1. However, there is still no integration with bench.sh to easily run individual benchmarks through mem_profile, nor is there a utility to compare results across different branches.

Describe the solution you'd like

Side Note

The way mem_profile collects the metrics and prints them out is quite different to other existing benchmark utilities.
For memory profiling, mem_profile spawns a new subprocess for each query execution. As a result, it does not generate a single output.json file for all bench queries like other benchmarks, but instead prints a summary table to stdout. To compare results across branches, we should either capture this stdout, or modify mem_profile.rs to also write results to a JSON file or other structured format.

Steps

  1. Navigate bench.sh and update places where it uses outdated entrypoint.
    e.g. replace --bin tpch with dfbench -- tpch
    (mem_profile passes subcommand and args to dfbench, so it would be easier to integrate it)
  2. Add support for memory profiling mode in bench.sh
    Modify bench.sh so that setting MEM_PROFILE=true runs each benchmark through the mem_profile binary instead of dfbench directly.
  3. Extend compare.py and mem_profile.rs to allow side-by-side comparison of memory usage across branches or runs

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions