-
Notifications
You must be signed in to change notification settings - Fork 7
Defaults id_column to None for PIGs & tests #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| class TestPigTablesGeneration: | ||
|
|
||
| @pytest.mark.parametrize("id_col_name", [None, "col_id"]) # test None as this is the default value in generate pig tabels | ||
| def test_col_id(self, id_col_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_col_id(self, id_col_name): | |
| def test_col_id(self, id_col_name: str | None): |
This is only possible for python 3.10 and above. I am wondering should I use this or the older version by importing typing and using typing.Union[str, None] for backwards compatibility reasons ? Or how does that actually work ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use Optional instead, which has the same functionality as Union, but is more readable.
| def test_col_id(self, id_col_name): | |
| def test_col_id(self, id_col_name: Optional[str]): |
| 'avg_target': [0.0, 0.5, 1.0, 0.0, 0.6666666666666666, 0.0, 1.0, 0.0] | ||
| }) | ||
|
|
||
| pd.testing.assert_frame_equal(out, expected) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pd.testing.assert_frame_equal(out, expected) | |
| pd.testing.assert_frame_equal(out, expected) | |
Story title
Remove mandatory id column #135
Changes made
Altered the calculate_pig_table & generate_pig_tables method such that the column_id_name is no longer needed, it was not really needed for the calculations in the background anyway.
Added a test in order to check if it still gives back the same results than before.
How does the solution address the problem
This PR provides the column_id_name by default with None. In that way, the Data scientist can just forget about the column_ID and the code still is backwards compatible.
Linked issues
Resolves #135