Skip to content

Conversation

@Abdul-Mukit
Copy link

@Abdul-Mukit Abdul-Mukit commented Nov 13, 2025

Description

Closes: #449
Add args.validation_interval to TrainConfig.
During the training session, make evaluate() calls and stats-logging every args.validation_interval .

Setting args.validation_interval to a large value (e.g. equal to the args.epochs + 1) essentially means we will not run the evaluate() operation throughout the training.

Related:
#416

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

Trained locally on basketball-player-detection-2.v13i.coco.zip dataset using the a script.

Test 1:

Train for 5 epochs with validation_interval=2.

from rfdetr import RFDETRNano

model = RFDETRNano()

model.train(
    dataset_dir="/home/abdul/projects/rf-detr/datasets_downloads/basketball-player-detection-2",
    epochs=5,
    batch_size=2,
    grad_accum_steps=1,
    lr=1e-4,
    num_workers=2,
    device='cuda',
    checkpoint_interval=1,
    validation_interval=2,
)

Expected behaviour:

  • Validation runs every 2 epochs.
  • All weight checkpoints are still generated according to args.checkpoint_interval.
  • log.txt contains an entry every 2 epochs.
  • metrics_plot.png shows data points every 2 epochs.
image

Test 2:

Train for 5 epochs without any validation. Set validation_interval=6.

from rfdetr import RFDETRNano

model = RFDETRNano()

model.train(
    dataset_dir="/home/abdul/projects/rf-detr/datasets_downloads/basketball-player-detection-2",
    epochs=5,
    batch_size=2,
    grad_accum_steps=1,
    lr=1e-4,
    num_workers=2,
    device='cuda',
    checkpoint_interval=1,
    validation_interval=6,
)

Expected behaviour:

  • No validation is run throughout the training process.
  • All weight checkpoints are still generated according to args.checkpoint_interval.
  • No log.txt, metrics_plot.png, checkpoint_best_regular.pth, checkpoint_best_ema.pth, checkpoint_best_total.pth, results.json files are generated. These are generated only when at least one validation call finishes.
  • No test is run even if args.run_test === True, as checkpoint_best_total.pth doesn't exist.

Any specific deployment considerations

None

Docs

None

…to skip validation and stats logging at every epoch.
…True. Check if the file exists before trying to load it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disable validation during training?

1 participant