Skip to content

[SPARK-21491][GraphX] Enhance GraphX performance: breakOut instead of .toMap#18693

Closed
SereneAnt wants to merge 1 commit into
apache:masterfrom
SereneAnt:performance_toMap-breakOut
Closed

[SPARK-21491][GraphX] Enhance GraphX performance: breakOut instead of .toMap#18693
SereneAnt wants to merge 1 commit into
apache:masterfrom
SereneAnt:performance_toMap-breakOut

Conversation

@SereneAnt
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Traversable.toMap changed to 'collections.breakOut', that eliminates intermediate tuple collection creation, see Stack Overflow article.

How was this patch tested?

Unit tests run.
No performance tests performed yet.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK that's what it is. Why is this different? the article you cited isn't comparing to toMap, and toMap doesn't make an intermediate collection.

Copy link
Copy Markdown
Contributor Author

@SereneAnt SereneAnt Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The principles are the same,
sources.zipWithIndex.map {...}' allocates a collection of tuples, .toMapthen iterates and converts them into the map (hehe). breakOut is the implementation of CanBuildFrom, the implicit parameter passed toTraversable.map` method. Once used, it allows populating newborn map directly, without intermediate collection of tuples.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization nerds have already used it in the spark code:

@SereneAnt SereneAnt force-pushed the performance_toMap-breakOut branch from d7b73b5 to 5abe060 Compare July 24, 2017 03:32
@SereneAnt SereneAnt force-pushed the performance_toMap-breakOut branch from 5abe060 to 0ae9cc5 Compare July 25, 2017 02:01
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 25, 2017

Test build #3850 has finished for PR 18693 at commit 0ae9cc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Copy Markdown
Member

srowen commented Jul 25, 2017

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants