From b3e00edc3db4f05c9cb4eabfb21cbb36b86d32b0 Mon Sep 17 00:00:00 2001 From: Zhang Le Date: Mon, 13 Aug 2018 16:27:24 +0800 Subject: [PATCH 1/3] [DOCS] Fixed NDCG formula issues When j is 0, log(j+1) will be 0, and this leads to division by 0 issue. --- docs/mllib-evaluation-metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index d9dbbab4840a3..374580468e4e4 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -461,11 +461,11 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{ Normalized Discounted Cumulative Gain - $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1} + $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=1}^{n} \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\ \text{Where} \\ \hspace{5 mm} n = \text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\ - \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} \frac{1}{\text{ln}(j+1)}$ + \hspace{5 mm} IDCG(D, k) = \sum_{j=1}^{\text{min}(\left|D\right|, k)} \frac{1}{\text{ln}(j+1)}$ NDCG at k is a From d0f814f378d77575c825a907b25e6e8a6714cae1 Mon Sep 17 00:00:00 2001 From: Zhang Le Date: Mon, 20 Aug 2018 11:09:55 +0800 Subject: [PATCH 2/3] [DOC] Update NDCG formula and wiki link --- docs/mllib-evaluation-metrics.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index 374580468e4e4..4f0bf615ef2cc 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -461,14 +461,14 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{ Normalized Discounted Cumulative Gain - $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=1}^{n} - \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\ + $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1} + \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\ \text{Where} \\ \hspace{5 mm} n = \text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\ - \hspace{5 mm} IDCG(D, k) = \sum_{j=1}^{\text{min}(\left|D\right|, k)} \frac{1}{\text{ln}(j+1)}$ + \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k)} \frac{1}{\text{ln}(j+2)}$ - NDCG at k is a + NDCG at k is a measure of how many of the first k recommended documents are in the set of true relevant documents averaged across all users. In contrast to precision at k, this metric takes into account the order of the recommendations (documents are assumed to be in order of decreasing relevance). From f87cf61c3a4ef41aeab4c9368b7fc9aa4983ab3e Mon Sep 17 00:00:00 2001 From: Zhang Le Date: Mon, 20 Aug 2018 11:39:59 +0800 Subject: [PATCH 3/3] [DOCS] Restore "-1" used in the upper bound --- docs/mllib-evaluation-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index 4f0bf615ef2cc..c65ecdcb67ee4 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -465,7 +465,7 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{ \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\ \text{Where} \\ \hspace{5 mm} n = \text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\ - \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k)} \frac{1}{\text{ln}(j+2)}$ + \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} \frac{1}{\text{ln}(j+2)}$ NDCG at k is a