Skip to content

Label balancing and symmetric behaviors for training classifiers#23

Merged
SkepticRaven merged 49 commits into
mainfrom
add-balanced-training
Dec 1, 2023
Merged

Label balancing and symmetric behaviors for training classifiers#23
SkepticRaven merged 49 commits into
mainfrom
add-balanced-training

Conversation

@SkepticRaven

Copy link
Copy Markdown
Contributor

Adds:

  • label downsampling routine for training classifiers with balanced labels
  • GUI checkbox for optionally conducting the balanced training
  • balanced training value stored in both trained classifier + project
  • random subsetting is hit by random_seed
  • bugfix on storing selected gui option for social features

…train a label-balanced classifier

Also bugfixing saving social feature checkbox into projects
@SkepticRaven SkepticRaven requested review from anshu957 and dahhei July 11, 2023 21:16
@SkepticRaven

Copy link
Copy Markdown
Contributor Author

Current TODO:

  • Update all calls of classifier.train to now include new uses_balance parameter

Right now, this option only really works within the gui. classify.py will fail.

Also forcing more project metadata to be saved when train button is clicked
   Fixes a bug where switching behaviors keep settings in UI, but they don't go into classifier (remain at default)
@SkepticRaven SkepticRaven marked this pull request as ready for review July 12, 2023 15:32
@SkepticRaven SkepticRaven added the enhancement New feature or request label Jul 13, 2023
@SkepticRaven SkepticRaven changed the title Label balancing for training classifiers Label balancing and symmetric behaviors for training classifiers Jul 18, 2023
@SkepticRaven

SkepticRaven commented Jul 18, 2023

Copy link
Copy Markdown
Contributor Author

Adding in another new feature: symmetric behaviors (where left-right features can be swapped)

  • Checkbox for enabling this feature in the classifier
  • Symmetric behaviors will essentially double the training dataset size and force the classifier to rely on asymmetric features less (due to L-R swapping for the doubling)

In addition to that, a massive rework was done to enable this feature to be added

  • Lots of new class functions to get feature names without an instantiated class object
  • Storing a new value into saved training data
    • min_pose_version: minimum required pose version for the classifier (currently not used in checks, but might be useful)
  • New all-k-fold button for running the maximum number of k-fold cross validations in the project
    • Helps with reporting better averages between multiple trainings when a video may be an outlier in performance (but useful for training)
  • Added std reporting for k-fold cross validation

writes exported data to the project directory
:param project: Project from which to export training data
:param behavior: Behavior to export
:param pose_version: Minimum required pose version for this classifier

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all pose versions currently backwards-compatible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features are backwards compatible, but projects/classifiers can only operate on the lowest common feature set. This is the current way projects calculate this number.

Currently no logic is present for decreasing the number passed here... but it could be added (rather than just relying on param1's value).

@dahhei

dahhei commented Jul 25, 2023

Copy link
Copy Markdown

I believe that I can't test these changes on my local install of JABS unless the update-reqs is merged first. There is a version from commit-head 49c9b10 on the HPC that I will use to test these updates. 😃

@dahhei

dahhei commented Jul 27, 2023

Copy link
Copy Markdown

Cayson initially raised this issue so I tested out the all k-fold cross-validation and it appears as though the top 10 feature labels are incorrect. I trained a Drinking classifier which typically has the distance to Lixit features present. This one does not. The all k-fold does seem to work otherwise though.

@dahhei

dahhei commented Jul 27, 2023

Copy link
Copy Markdown

It would be nice if the cross validation iteration value included a "cross validation iteration 14/n", with n being the total number that it will test. 🤠

@SkepticRaven

Copy link
Copy Markdown
Contributor Author

Lower pose version projects aren't getting their feature set correctly propagated to the new functions. On a pose_v2 project:

Traceback (most recent call last):
  File "/behavior-classifier/src/ui/training_thread.py", line 56, in run
    features, group_mapping = self._project.get_labeled_features(
  File "/behavior-classifier/src/project/project.py", line 663, in get_labeled_features
    column_names = features.get_feature_column_names(
  File "/behavior-classifier/src/feature_extraction/features.py", line 539, in get_feature_column_names
    column_names = self.get_feature_name_vector(pose_version=self._pose_version, use_social=use_social, extended_features=self._extended_features)
  File "/behavior-classifier/src/feature_extraction/features.py", line 494, in get_feature_name_vector
    assert pose_version >= 5
AssertionError

1. removed reliance on local paths
2. added option to use pose files or videos
3. removed forcing xgboost (in case the user specifies a different classifier)
4. adjusted pose file searching to be more easily update for new pose versions
@SkepticRaven

Copy link
Copy Markdown
Contributor Author

The commits above are a rebase merge with main (based on the other recent merge into main: #27)

@SkepticRaven

Copy link
Copy Markdown
Contributor Author

The feature rewrite to propagate names alongside values fixed the pose_v2 bug.
This branch should be ready to merge (after updating the appropriate version numbers)!

Main always gets bumped
Features because internal structure + storage method changed
Classifier because feature names are now included for sorting inputs
@SkepticRaven SkepticRaven merged commit 1f5a045 into main Dec 1, 2023
@SkepticRaven SkepticRaven deleted the add-balanced-training branch December 1, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants