Skip to content

[SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage#3549

Closed
JerryLead wants to merge 4 commits into
apache:masterfrom
JerryLead:my_graphX_checkpoint
Closed

[SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten the lineage#3549
JerryLead wants to merge 4 commits into
apache:masterfrom
JerryLead:my_graphX_checkpoint

Conversation

@JerryLead
Copy link
Copy Markdown
Contributor

The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672

Iterative GraphX applications always have long lineage, while checkpoint() on EdgeRDD and VertexRDD themselves cannot shorten the lineage. In contrast, if we perform checkpoint() on their ParitionsRDD, the long lineage can be cut off. Moreover, the existing operations such as cache() in this code is performed on the PartitionsRDD, so checkpoint() should do the same way. More details and explanation can be found in the JIRA.

@AmplabJenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

@ankurdave
Copy link
Copy Markdown
Contributor

ok to test

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 2, 2014

Test build #24056 has started for PR 3549 at commit d1aa8d8.

  • This patch merges cleanly.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Dec 2, 2014

Test build #24056 has finished for PR 3549 at commit d1aa8d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24056/
Test PASSed.

@asfgit asfgit closed this in fc0a147 Dec 3, 2014
asfgit pushed a commit that referenced this pull request Dec 3, 2014
…the lineage

The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672

Iterative GraphX applications always have long lineage, while checkpoint() on EdgeRDD and VertexRDD themselves cannot shorten the lineage. In contrast, if we perform checkpoint() on their ParitionsRDD, the long lineage can be cut off. Moreover, the existing operations such as cache() in this code is performed on the PartitionsRDD, so checkpoint() should do the same way. More details and explanation can be found in the JIRA.

Author: JerryLead <JerryLead@163.com>
Author: Lijie Xu <csxulijie@gmail.com>

Closes #3549 from JerryLead/my_graphX_checkpoint and squashes the following commits:

d1aa8d8 [JerryLead] Perform checkpoint() on PartitionsRDD not VertexRDD and EdgeRDD themselves
ff08ed4 [JerryLead] Merge branch 'master' of https://github.com/apache/spark
c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark
52799e3 [Lijie Xu] Merge pull request #1 from apache/master

(cherry picked from commit fc0a147)
Signed-off-by: Ankur Dave <ankurdave@gmail.com>
@ankurdave
Copy link
Copy Markdown
Contributor

Thanks, merged into master and branch-1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants