Skip to content

Improve speed of train/selection/validation split function #53

@JanBenisek

Description

@JanBenisek

Currently, the funciton is using train_test_split() from sklearn twice, but with large datasets, the functions becomes slow and memory demanding due to the fact that we are creating multiple dataframes.
The solution would be just to append a list with the split [train, selection, train, validation ... ]

def train_selection_validation_split(data: pd.DataFrame,

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions