[SPARK-21491][GraphX] Enhance GraphX performance: breakOut instead of .toMap#18693
[SPARK-21491][GraphX] Enhance GraphX performance: breakOut instead of .toMap#18693SereneAnt wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Ah OK that's what it is. Why is this different? the article you cited isn't comparing to toMap, and toMap doesn't make an intermediate collection.
There was a problem hiding this comment.
The principles are the same,
sources.zipWithIndex.map {...}' allocates a collection of tuples, .toMapthen iterates and converts them into the map (hehe). breakOut is the implementation of CanBuildFrom, the implicit parameter passed toTraversable.map` method. Once used, it allows populating newborn map directly, without intermediate collection of tuples.
There was a problem hiding this comment.
Optimization nerds have already used it in the spark code:
d7b73b5 to
5abe060
Compare
…ate collections creation with breakOut
5abe060 to
0ae9cc5
Compare
|
Test build #3850 has finished for PR 18693 at commit
|
|
Merged to master |
What changes were proposed in this pull request?
Traversable.toMapchanged to 'collections.breakOut', that eliminates intermediate tuple collection creation, see Stack Overflow article.How was this patch tested?
Unit tests run.
No performance tests performed yet.
Please review http://spark.apache.org/contributing.html before opening a pull request.