Added array optimimzation fuse notebook#89
Conversation
alimanfoo
left a comment
There was a problem hiding this comment.
Thanks a lot for writing this up, very useful!
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "inputs_rechunked.blocks[0, :2].visualize(optimize_graph=True)" |
There was a problem hiding this comment.
Nice trick, didn't know about this :-)
|
Thanks @alimanfoo, I've applied your suggestions. @mrocklin do you have high-level thoughts on this? Does this feel like we're just documenting a workaround to a weakness of Dask that we should instead be fixing? |
|
Yes, to me this notebook seems perhaps overly-specific to a single use
case. I'm having trouble finding ways to generalize this notebook to other
situations. I think that a general example of optimization would be
useful. There are plenty of cases where this comes up, such as in ML
workloads where you really want X and y to be co-allocated. That case
might also be a bit simpler.
…On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***> wrote:
Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your
suggestions.
@mrocklin <https://github.com/mrocklin> do you have high-level thoughts
on this? Does this feel like we're just documenting a workaround to a
weakness of Dask that we should instead be fixing?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A>
.
|
|
Although in general in many of these cases I think that we can improve them
just by expanding Blockwise and HighLevelGraph operator fusion out to data
access operations
…On Fri, Jul 19, 2019 at 3:15 PM Matthew Rocklin ***@***.***> wrote:
Yes, to me this notebook seems perhaps overly-specific to a single use
case. I'm having trouble finding ways to generalize this notebook to other
situations. I think that a general example of optimization would be
useful. There are plenty of cases where this comes up, such as in ML
workloads where you really want X and y to be co-allocated. That case
might also be a bit simpler.
On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***>
wrote:
> Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your
> suggestions.
>
> @mrocklin <https://github.com/mrocklin> do you have high-level thoughts
> on this? Does this feel like we're just documenting a workaround to a
> weakness of Dask that we should instead be fixing?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A>
> .
>
|
|
@TomAugspurger , did you have plans to try to make the story here more general? |
|
Not at the moment.
…On Wed, Jul 31, 2019 at 2:00 PM Martin Durant ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> , did you have plans to
try to make the story here more general?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#89?email_source=notifications&email_token=AAKAOIXRNST4KKWHAAEYPJTQCHOLXA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IHNFQ#issuecomment-516978326>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITRDXUXDIRQKX5NCDDQCHOLXANCNFSM4IE3TV3A>
.
|
|
@mrocklin question on the HLG fusion: would you expect adding additional I ask because when I look at just the creation / stacking / rechunking, we don't import dask.array as da
inputs = [da.random.random(size=500_000, chunks=90_000)
for _ in range(5)]
inputs_stacked = da.vstack(inputs)
inputs_rechunked = inputs_stacked.rechunk((50, 90_000))
inputs_rechunked.visualize(optimize_graph=True)So unless adding a |

From dask/dask#5105.
https://mybinder.org/v2/gh/TomAugspurger/dask-examples/array-fuse (building an image now)