Collate test output to allow workers > 1 with verbose output by bryevdv · Pull Request #507 · nv-legate/cupynumeric

bryevdv · 2022-08-05T01:14:38Z

Still WIP but I want to see how CI performs.

bryevdv · 2022-08-05T17:51:17Z

@magnatelee This generally seems better (or at least comparable) to old CI runs in most cases. The exception is the GPU tests, which still seem to be slower. The CI machines have 2 GPUs and the new algo still only decides to use 1 worker for both tests. It seems like that's incorrect, unless the GPUs have very low fbsize?

Edit: I guess so? nvidia-smi report s 16160MiB in the test, and so with current fbmem and bloat factor defaults:

int(fbsize // (config.fbmem * BLOAT_FACTOR)) # ~16 / (6 * 1.5) = 1

I'm not sure why the GPU jobs would have run faster in the old arrangement tho (since they also had only 1 worker)

bryevdv · 2022-08-05T19:46:11Z

Well, this was incorrect:

since they also had only 1 worker

The old test runs use 2 workers

### Entering stage: GPU (with 2 workers)

which I think accounts for the ~2x slow-down for GPU tests.

@magnatelee should we adjust the worker spec computation in the GPU stage?

bryevdv · 2022-08-05T20:40:00Z

The previous workers computation was more or less the current one, except without the bloat factor. Reducing it from 1.5 to 1.25 just to see how CI performs. @magnatelee should we make the bloat factor confugurable?

magnatelee · 2022-08-05T20:49:32Z

Ok. I think some clarification would help here. First of all, the legate launcher sets the framebuffer size to 4GB by default. Second, the 6GB budget in the test script already takes the bloating factor into account. That means DEFAULT_GPU_MEMORY_BUDGET used in the current test script is not the framebuffer size the launcher uses but the bloated one. I think two changes are needed:

Change DEFAULT_GPU_MEMORY_BUDGET to 4GB
Pass config.fbmem to to the launcher when running GPU tests (i.e., add ["--fbmem", config.fbmem] to the launcher args for GPU tests)

Then, you can keep the bloating factor being 1.5 and everything works as it did before. I'm fine with making the bloating factor configurable as well, but I don't expect developers to configure it by themselves.

bryevdv · 2022-08-05T21:36:40Z

@magnatelee I have made suggestion 1) in 6aa5188

When I try to add suggestion 2) get program aborts on my local system:

+LEGATE_TEST=1 /home/bryan/work/legate.core/install38/bin/legate /home/bryan/work/cunumeric/examples/cholesky.py -cunumeric:test --gpus 2 --gpu-bind 0,1 --fbmem 4294967296
[FAIL] (GPU) examples/cholesky.py
   /home/bryan/work/legate.core/install38/bin/bind.sh: line 107: 82933 Aborted                 numactl "$@"

it looks like numactl does not know what to do with --fbmem (or further up, that bind.sh does not know how to pass it on to legate instead of numactl, possibly?)

bryevdv · 2022-08-05T21:45:37Z

Assuming the latest run is as good as the previous one:

I'd suggest merging this as-is, so that the team can benefit from faster CI jobs, and then making a follow-on issue about explicitly providing --fbmem

magnatelee · 2022-08-05T22:03:29Z

I think this part is problematic: --fbmem 4294967296. --fbmem takes a size in megabytes, so your command is asking for a 4PB space on framebuffer.

bryevdv · 2022-08-05T22:10:54Z

asking for a 4PB space on framebuffer.

3af2be8 seems to fix locally :D

…v-legate#507) * report times in summary lines * fix typo * Add an overall test suite summary * defer test output until test completion * remove -j 1 argument to test.sh * try bloat factor = 1.25 * fix default fbsize and bloat factor * specify fbmem in MB

bryevdv added 4 commits August 4, 2022 13:43

report times in summary lines

f2ec215

fix typo

954a3cc

Add an overall test suite summary

134eec9

defer test output until test completion

0675ef9

bryevdv changed the title ~~[WIP] Collate test output to allow workers~~ [WIP] Collate test output to allow workers > 1 with verbose output Aug 5, 2022

remove -j 1 argument to test.sh

ffe35f9

bryevdv requested a review from magnatelee August 5, 2022 17:46

try bloat factor = 1.25

0232be6

fix default fbsize and bloat factor

6aa5188

specify fbmem in MB

3af2be8

magnatelee approved these changes Aug 5, 2022

View reviewed changes

bryevdv merged commit 0c7da23 into nv-legate:branch-22.10 Aug 5, 2022

bryevdv deleted the bv/test_collate branch August 5, 2022 23:18

bryevdv changed the title ~~[WIP] Collate test output to allow workers > 1 with verbose output~~ Collate test output to allow workers > 1 with verbose output Aug 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collate test output to allow workers > 1 with verbose output#507

Collate test output to allow workers > 1 with verbose output#507
bryevdv merged 8 commits intonv-legate:branch-22.10from
bryevdv:bv/test_collate

bryevdv commented Aug 5, 2022 •

edited

Loading

Uh oh!

bryevdv commented Aug 5, 2022 •

edited

Loading

Uh oh!

bryevdv commented Aug 5, 2022 •

edited

Loading

Uh oh!

bryevdv commented Aug 5, 2022

Uh oh!

magnatelee commented Aug 5, 2022 •

edited

Loading

Uh oh!

bryevdv commented Aug 5, 2022 •

edited

Loading

Uh oh!

bryevdv commented Aug 5, 2022

Uh oh!

magnatelee commented Aug 5, 2022

Uh oh!

bryevdv commented Aug 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bryevdv commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryevdv commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryevdv commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryevdv commented Aug 5, 2022

Uh oh!

magnatelee commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryevdv commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryevdv commented Aug 5, 2022

Uh oh!

magnatelee commented Aug 5, 2022

Uh oh!

bryevdv commented Aug 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bryevdv commented Aug 5, 2022 •

edited

Loading

bryevdv commented Aug 5, 2022 •

edited

Loading

bryevdv commented Aug 5, 2022 •

edited

Loading

magnatelee commented Aug 5, 2022 •

edited

Loading

bryevdv commented Aug 5, 2022 •

edited

Loading