[SPARK-19110][ML][MLLIB]:DistributedLDAModel returns different logPrior for original and loaded model#16491
[SPARK-19110][ML][MLLIB]:DistributedLDAModel returns different logPrior for original and loaded model#16491wangmiao1981 wants to merge 2 commits into
Conversation
|
Jenkins, retest this please. |
|
Test build #70992 has finished for PR 16491 at commit
|
| val trainingLogLikelihood2 = | ||
| model2.asInstanceOf[DistributedLDAModel].trainingLogLikelihood | ||
| assert(logPrior ~== logPrior2 absTol 1e-6) | ||
| assert(trainingLogLikelihood ~== trainingLogLikelihood2 absTol 1e-6) |
There was a problem hiding this comment.
should we check trainingLogLikelihood and logPrior are not changing for LocalLDAModel?
There was a problem hiding this comment.
logLikelihood and logPrior are only for distributed model.
There was a problem hiding this comment.
right - I mean that they are not persisted & loaded into an unexpected but valid value (!= Double.NaN)
There was a problem hiding this comment.
LocalLDAModel doesn't extend DistributedLDAModel and vice versa. I am not clear how to check trainingLogLikelihood and logPrior in LocalLDAModel.
There was a problem hiding this comment.
Ok, I guess I remember this wrong because of the other PR.
|
@jkbradley @yanboliang please have a look |
|
Yikes, thanks for fixing this! |
…or for original and loaded model
## What changes were proposed in this pull request?
While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different.
For example, in the test("read/write DistributedLDAModel"), I add the test:
val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
assert(logPrior === logPrior2)
The test fails:
-4.394180878889078 did not equal -4.294290536919573
The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns different `logPrior`.
Please refer to #16464 for details.
## How was this patch tested?
Add a new unit test for testing logPrior.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes #16491 from wangmiao1981/ldabug.
(cherry picked from commit 036b503)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…or for original and loaded model
## What changes were proposed in this pull request?
While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different.
For example, in the test("read/write DistributedLDAModel"), I add the test:
val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
assert(logPrior === logPrior2)
The test fails:
-4.394180878889078 did not equal -4.294290536919573
The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns different `logPrior`.
Please refer to #16464 for details.
## How was this patch tested?
Add a new unit test for testing logPrior.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes #16491 from wangmiao1981/ldabug.
(cherry picked from commit 036b503)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…or for original and loaded model
## What changes were proposed in this pull request?
While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different.
For example, in the test("read/write DistributedLDAModel"), I add the test:
val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
assert(logPrior === logPrior2)
The test fails:
-4.394180878889078 did not equal -4.294290536919573
The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns different `logPrior`.
Please refer to apache#16464 for details.
## How was this patch tested?
Add a new unit test for testing logPrior.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes apache#16491 from wangmiao1981/ldabug.
…nd logLikelihood of DistributedLDAModel in MLLIB ## What changes were proposed in this pull request? apache#16491 added the fix to mllib and a unit test to ml. This followup PR, add unit tests to mllib suite. ## How was this patch tested? Unit tests. Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16524 from wangmiao1981/ldabug.
…or for original and loaded model
## What changes were proposed in this pull request?
While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different.
For example, in the test("read/write DistributedLDAModel"), I add the test:
val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
assert(logPrior === logPrior2)
The test fails:
-4.394180878889078 did not equal -4.294290536919573
The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns different `logPrior`.
Please refer to apache#16464 for details.
## How was this patch tested?
Add a new unit test for testing logPrior.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes apache#16491 from wangmiao1981/ldabug.
…nd logLikelihood of DistributedLDAModel in MLLIB ## What changes were proposed in this pull request? apache#16491 added the fix to mllib and a unit test to ml. This followup PR, add unit tests to mllib suite. ## How was this patch tested? Unit tests. Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16524 from wangmiao1981/ldabug.
…nd logLikelihood of DistributedLDAModel in MLLIB ## What changes were proposed in this pull request? apache#16491 added the fix to mllib and a unit test to ml. This followup PR, add unit tests to mllib suite. ## How was this patch tested? Unit tests. Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16524 from wangmiao1981/ldabug.
What changes were proposed in this pull request?
While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different.
For example, in the test("read/write DistributedLDAModel"), I add the test:
val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
assert(logPrior === logPrior2)
The test fails:
-4.394180878889078 did not equal -4.294290536919573
The reason is that
graph.vertices.aggregate(0.0)(seqOp, _ + _)only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns differentlogPrior.Please refer to #16464 for details.
How was this patch tested?
Add a new unit test for testing logPrior.