Skip to content

DOC Update coef_ type for sparse linear models#34118

Open
praful-srinivasan-027 wants to merge 7 commits into
scikit-learn:mainfrom
praful-srinivasan-027:doc-sparse-coef
Open

DOC Update coef_ type for sparse linear models#34118
praful-srinivasan-027 wants to merge 7 commits into
scikit-learn:mainfrom
praful-srinivasan-027:doc-sparse-coef

Conversation

@praful-srinivasan-027

Copy link
Copy Markdown

Reference Issues/PRs

Fixes #34117

What does this implement/fix? Explain your changes.

Updates the coef_ attribute documentation for linear models supporting
sparse coefficients through sparsify().

The documentation now mentions that coef_ can be either an ndarray
or a sparse matrix for affected estimators such as SGD-based models,
Perceptron, PassiveAggressiveClassifier, and LogisticRegressionCV.

AI usage disclosure

I used AI assistance for:

  • Research and understanding

Any other comments?

@github-actions github-actions Bot added the CI:Linter failure The linter CI is failing on this PR label May 25, 2026
@github-actions github-actions Bot removed the CI:Linter failure The linter CI is failing on this PR label May 25, 2026
@praful-srinivasan-027

Copy link
Copy Markdown
Author

Hi everyone! Just a gentle bump on this PR whenever someone has a chance to take a look. I'd really appreciate any feedback. Thanks!

@praful-srinivasan-027

Copy link
Copy Markdown
Author

Hi @GaelVaroquaux , @glemaitre,

Just checking in on this small documentation fix since it's been 2 weeks. The CI is completely green and passing (30 tests passed, 8 skipped).

If you have a quick moment to take a look or pass it along to whoever is currently tracking the doc queue, I'd really appreciate it. Thanks!

@StefanieSenger StefanieSenger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @praful-srinivasan-027,

thanks a lot for your contribution. I've left you two little comments.
Otherwise looks fine.

Comment thread sklearn/linear_model/_stochastic_gradient.py Outdated
Comment on lines +1906 to +1910
By default, it will be created as a dense array, but can be turned to
sparse (CSR format) through :meth:`sparsify` (which can be beneficial
under L1 regularization when many coefficients are zero), and back to
dense through :meth:`densify`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you mention this here, but not in the other cases?

@praful-srinivasan-027 praful-srinivasan-027 Jun 6, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was originally a pr adding the docs addressing the same for logistic regression here #34093. I made these changes here to keep it consistent with this PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I know. What I meant is: why did you not not add a similar message in the other places to keep them consistent with #34093, too?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mb, I've added the same explanatory text to the other affected estimators as well to keep the documentation consistent with #34093.

@StefanieSenger StefanieSenger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @praful-srinivasan-027.

Comment on lines +1906 to +1910
By default, it will be created as a dense array, but can be turned to
sparse (CSR format) through :meth:`sparsify` (which can be beneficial
under L1 regularization when many coefficients are zero), and back to
dense through :meth:`densify`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I know. What I meant is: why did you not not add a similar message in the other places to keep them consistent with #34093, too?

Comment thread sklearn/linear_model/_stochastic_gradient.py Outdated
@StefanieSenger StefanieSenger added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jun 7, 2026
A list of class labels known to the classifier.

coef_ : ndarray or CSR matrix of shape (1, n_features) or (n_classes, n_features)
coef_ : ndarray or CSR array/matrix of shape (1, n_features) or (n_classes, \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremiedbb actually removed the array from here in #34093 (review), so this shouldn't be put back in here (and should probably also be removed from the other changes?).

@StefanieSenger StefanieSenger Jun 7, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, why was scipy.sparse.array as a possible type removed?
Since #31177 users can specify which type sparsify() returns with set_config.

@StefanieSenger

Copy link
Copy Markdown
Member

Your last commit makes it worse, @praful-srinivasan-027. These don't all offer regularisation (which is mentioned in the explanation). Please remove the new additions.

@praful-srinivasan-027

Copy link
Copy Markdown
Author

Thanks for the clarification. I've removed the L1-regularization-specific wording from the affected estimators and kept a generic explanation of the sparsify/densify behavior where applicable. The documentation now reflects the sparse coefficient support without making assumptions about regularization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparse coefficients in linear models are not documented

3 participants