Support NaN values in UpliftTree and UpliftRandomForest by aman-coder03 · Pull Request #860 · uber/causalml

aman-coder03 · 2025-12-29T07:28:55Z

This PR adds native support for missing values (NaNs) in UpliftTree and
UpliftRandomForest.

Each candidate split evaluates both possible NaN routing directions
(left/right) and learns the optimal routing per node, similar to
scikit-learn’s decision tree behavior.

The learned NaN routing is stored in each tree node and applied
consistently during training, pruning, filling, and prediction.

This resolves #802.

CLAassistant · 2025-12-29T07:29:02Z

All committers have signed the CLA.

aman-coder03 · 2025-12-29T10:07:24Z

The Read the Docs build failure appears to be due to Cython extensions requiring a compiler in the RTD environment.
No documentation-related files were modified in this PR.
Happy to adjust the RTD config if you’d like me to handle this

jeongyoonlee

Thanks for your contribution, @aman-coder03. I left a few comments. Can you address them? Also, please add test code to tests/test_uplift_trees.py accordingly. Thanks!

jeongyoonlee · 2026-01-29T23:59:02Z

causalml/inference/tree/uplift.pyx

                is_split_by_gt = False

            for value in lsUnique:
-                len_X_l = group_counts_by_divide(columnValues, value, is_split_by_gt, treatment_idx, y, left_count_arr)


We need this code block. Is there any reason you removed it?

jeongyoonlee · 2026-01-30T00:00:06Z

causalml/inference/tree/uplift.pyx

+                        n_reg
                    )

-                    early_stopping_flag = False


We need this code block for early stopping.

jeongyoonlee · 2026-01-30T00:02:25Z

causalml/inference/tree/uplift.pyx

NaN handling is needed here in the percentile calculation.

aman-coder03 · 2026-02-03T08:18:23Z

@jeongyoonlee i will look into this!

aman-coder03 · 2026-02-06T10:23:10Z

hey @jeongyoonlee i have implemented the changes, please have a look!

jeongyoonlee · 2026-02-08T01:03:40Z

@aman-coder03, I see some code blocks necessary (e.g., code for the CTS evaluation criteria) are removed without any comments. Also, some tests are failing.

aman-coder03 · 2026-02-08T13:43:47Z

Thanks for pointing that out. I will restore the original CTS and evaluation blocks and limit changes strictly to NaN routing logic to preserve original behavior @jeongyoonlee

aman-coder03 · 2026-02-11T18:58:42Z

@jeongyoonlee can you please have a look at the code changes now

jeongyoonlee · 2026-02-20T17:53:39Z

Code review

Found 1 issue:

np.isnan() called unconditionally on non-numeric columns in divideSet, causing a TypeError for any dataset with string/categorical features.

The new NaN routing code in divideSet applies np.isnan(X[:, column]) without checking whether the column is numeric:

causalml/causalml/inference/tree/uplift.pyx

Lines 916 to 921 in 4e84b3f

    
               filt = X[:, column] == value 
        
           # Handle NaNs 
        
           nan_mask = np.isnan(X[:, column]) 
        
           if missing_go_to_left:

divideSet handles both numeric splits (isinstance(value, numbers.Number)) and string splits (the else branch). When X[:, column] is an object array of strings, np.isnan raises:

TypeError: ufunc 'isnan' not supported for the input types

The companion function divideSet_len in this same PR correctly guards the NaN logic with if np.issubdtype(X[:, column].dtype, np.number)::

causalml/causalml/inference/tree/uplift.pyx

Lines 960 to 967 in 4e84b3f

    
           # Handle NaNs only for numeric columns 
        
           if np.issubdtype(X[:, column].dtype, np.number): 
        
               nan_mask = np.isnan(X[:, column]) 
        
               if missing_go_to_left: 
        
                   filt = filt | nan_mask 
        
               else:

The same unguarded np.isnan pattern also appears in group_counts_by_divide:

causalml/causalml/inference/tree/uplift.pyx

Lines 316 to 321 in 4e84b3f

    
           # Handle NaNs 
        
           cdef np.ndarray[np.uint8_t, ndim=1, cast=True] nan_mask = np.isnan(col_vals) 
        
           if missing_go_to_left: 
        
               filt = filt | nan_mask

The fix is to wrap both np.isnan calls with the same numeric dtype guard used in divideSet_len. The test added in this PR only injects NaNs into a numeric feature column, so it would not catch this regression.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

aman-coder03 mentioned this pull request Dec 29, 2025

Support for Inputs with NaN in Uplift Trees/Uplift RandomForest #802

Open

jeongyoonlee requested changes Jan 30, 2026

View reviewed changes

aman-coder03 force-pushed the feature/handle-nan-uplift-trees branch from e13b20b to bd23f63 Compare February 3, 2026 15:03

aman-coder03 requested a review from jeongyoonlee February 4, 2026 15:57

aman-coder03 added 5 commits February 12, 2026 01:15

Add NaN support for UpliftTree and UpliftRandomForest

17ecd78

update NaN support for UpliftTree and UpliftRandomForest

a9789b1

Fix Cython gain declaration and restore early stopping; add NaN tests

af1ca0a

Restore upstream uplift.pyx before minimal NaN changes

8543800

review comments

4163bf5

aman-coder03 force-pushed the feature/handle-nan-uplift-trees branch from 97c291e to 4163bf5 Compare February 11, 2026 19:46

updating with ensure_all_finite

4e84b3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support NaN values in UpliftTree and UpliftRandomForest#860

Support NaN values in UpliftTree and UpliftRandomForest#860
aman-coder03 wants to merge 6 commits intouber:masterfrom
aman-coder03:feature/handle-nan-uplift-trees

aman-coder03 commented Dec 29, 2025

Uh oh!

CLAassistant commented Dec 29, 2025 •

edited

Loading

Uh oh!

aman-coder03 commented Dec 29, 2025

Uh oh!

jeongyoonlee left a comment

Uh oh!

jeongyoonlee Jan 29, 2026

Uh oh!

jeongyoonlee Jan 30, 2026

Uh oh!

jeongyoonlee Jan 30, 2026

Uh oh!

aman-coder03 commented Feb 3, 2026

Uh oh!

aman-coder03 commented Feb 6, 2026

Uh oh!

jeongyoonlee commented Feb 8, 2026

Uh oh!

aman-coder03 commented Feb 8, 2026

Uh oh!

aman-coder03 commented Feb 11, 2026

Uh oh!

jeongyoonlee commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aman-coder03 commented Dec 29, 2025

Uh oh!

CLAassistant commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aman-coder03 commented Dec 29, 2025

Uh oh!

jeongyoonlee left a comment

Choose a reason for hiding this comment

Uh oh!

jeongyoonlee Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

jeongyoonlee Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

jeongyoonlee Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

aman-coder03 commented Feb 3, 2026

Uh oh!

aman-coder03 commented Feb 6, 2026

Uh oh!

jeongyoonlee commented Feb 8, 2026

Uh oh!

aman-coder03 commented Feb 8, 2026

Uh oh!

aman-coder03 commented Feb 11, 2026

Uh oh!

jeongyoonlee commented Feb 20, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Dec 29, 2025 •

edited

Loading