Skip to content

Rework configuration classes#1

Merged
jlamypoirier merged 7 commits intomainfrom
rework_configs
Oct 16, 2024
Merged

Rework configuration classes#1
jlamypoirier merged 7 commits intomainfrom
rework_configs

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Oct 11, 2024

A first pass at reworking the configuration classes to avoid redundancies, inconsistancies, confusing names, etc.
Also chopped off much of the obsolete RunConfig.

RENAME,DEFAULT training.validation_interval = 1000 -> training.validation.interval = None
RENAME,DEFAULT training.validation_iters = 0 -> training.validation.iterations = None
ADD training.validation.offset = 0
OTHER: Must set both interval > 0 and iterations > 0 to enable validation.

RENAME,DEFAULT run.log_interval = 100 -> training.logs.interval = None
RENAME,DEFAULT run.log_offset = 1 -> training.logs.offset = 0

RENAME run.checkpoint_interval -> training.checkpoint.interval
RENAME run.checkpoint_offset -> training.checkpoint.offset
RENAME,DEFAULT run.max_checkpoints = None -> training.checkpoint.keep = 5

RENAME run.export_interval -> training.export.interval
ADD training.export.offset = 0
OTHER: Export must be sub-interval of checkpoint

RENAME, FORMAT run.export_callback_script:str -> training.export.callback.script:list[str]|None
RENAME, FORMAT run.export_callback_env:str -> training.export.callback.environment:dict

RENAME run.stop_interval -> training.shutdown.interval
RENAME run.stop_offset -> training.shutdown.offset
OTHER: Shutdown must be sub-interval of checkpoint

RENAME run.wandb_group_name -> training.wandb.group_name
RENAME run.wandb_project_name -> training.wandb.project_name
RENAME run.wandb_entity_name -> training.wandb.entity_name
RENAME run.wandb_status_interval -> training.wandb.alert.interval
ADD training.wandb.alert.offset = 0
RENAME run.wandb_post_alerts -> training.wandb.alert.status_updates
OTHER: Alerts must be sub-interval of logs
OTHER: Extract wandb into separate class

RENAME run.save_tensor_logs -> training.tensor_logs.save
RENAME run.show_tensor_logs -> training.tensor_logs.show
RENAME run.tensor_logs_show_elements -> training.tensor_logs.max_elements

RENAME pretrained.pretrained_checkpoint_path -> pretrained.path
RENAME pretrained.pretrained_checkpoint_type -> pretrained.format
RENAME pretrained.imported_model_type -> pretrained.imported_type
REMOVE pretrained.use_pretrained_config
RENAME pretrained.ignore_pretrained_config -> pretrained.override_architecture
RENAME pretrained.load_pretrained_weights -> pretrained.load_weights
RENAME pretrained.load_pretrained_optimizer -> pretrained.load_optimizer

RENAME data.tokenizer.tokenizer_type -> data.tokenizer.format
RENAME data.tokenizer.tokenizer_file -> data.tokenizer.path

RENAME data.fim.fim_[...] -> data.fim.[...]

REMOVE data.dataset_type
RENAME data.dataset_source -> data.format
RENAME data.data_path -> data.path

RENAME profile.profile_column_width -> profile.table_width
RENAME profile.profile_[...] -> profile.[...]

RENAME optimizer.lr_schedule.lr -> optimizer.learning_rate.base
RENAME optimizer.lr_schedule.lr_decay_style -> optimizer.learning_rate.decay_style
RENAME optimizer.lr_schedule.lr_decay_iters -> optimizer.learning_rate.decay_iterations
RENAME optimizer.lr_schedule.lr_decay_power -> optimizer.learning_rate.decay_power
RENAME optimizer.lr_schedule.lr_warmup_iters -> optimizer.learning_rate.warmup_iterations
RENAME optimizer.lr_schedule.min_lr -> optimizer.learning_rate.minimum
RENAME optimizer.lr_schedule.lr_schedule -> optimizer.learning_rate.schedule

RENAME optimizer.adam_beta1 -> optimizer.beta_1
RENAME optimizer.adam_beta2 -> optimizer.beta_2
RENAME optimizer.adam_eps -> optimizer.epsilon

RENAME optimizer.clip_grad -> optimizer.gradient_norm_clipping
RENAME optimizer.loss_scale -> optimizer.gradient_scaler.constant
RENAME optimizer.initial_loss_scale -> optimizer.gradient_scaler.initial
RENAME optimizer.min_loss_scale -> optimizer.gradient_scaler.minimum
RENAME optimizer.loss_scale_window -> optimizer.gradient_scaler.window
RENAME optimizer.hysteresis -> optimizer.gradient_scaler.hysteresis
RENAME optimizer.default_lr_scale -> optimizer.default_learning_rate_scale
REMOVE optimizer.lr_schedule_offset

!!!RENAME model.base_model.transformer.normalization.normalization_type -> model.base_model.transformer.normalization.type
!!!RENAME model.base_model.transformer.normalization.layer_norm_eps -> model.base_model.transformer.normalization.epsilon
!!!RENAME model.base_model.transformer.normalization.zero_centered_normalization -> model.base_model.transformer.normalization.zero_centered
RENAME model.base_model.transformer.normalization.normalization_implementation -> model.base_model.transformer.normalization.implementation
RENAME model.base_model.transformer.normalization.layer_norm_init_range -> model.base_model.transformer.normalization.initialization_range

@jlamypoirier jlamypoirier requested a review from tscholak October 15, 2024 20:35
@jlamypoirier jlamypoirier marked this pull request as ready for review October 15, 2024 20:35
@jlamypoirier jlamypoirier merged commit 6c5dee4 into main Oct 16, 2024
@jlamypoirier jlamypoirier deleted the rework_configs branch October 16, 2024 13:39
@jlamypoirier jlamypoirier mentioned this pull request Oct 25, 2024
@tscholak tscholak added this to the 0.2.0 milestone Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants