Conversation
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A first pass at reworking the configuration classes to avoid redundancies, inconsistancies, confusing names, etc.
Also chopped off much of the obsolete RunConfig.
RENAME,DEFAULT training.validation_interval = 1000 -> training.validation.interval = None
RENAME,DEFAULT training.validation_iters = 0 -> training.validation.iterations = None
ADD training.validation.offset = 0
OTHER: Must set both interval > 0 and iterations > 0 to enable validation.
RENAME,DEFAULT run.log_interval = 100 -> training.logs.interval = None
RENAME,DEFAULT run.log_offset = 1 -> training.logs.offset = 0
RENAME run.checkpoint_interval -> training.checkpoint.interval
RENAME run.checkpoint_offset -> training.checkpoint.offset
RENAME,DEFAULT run.max_checkpoints = None -> training.checkpoint.keep = 5
RENAME run.export_interval -> training.export.interval
ADD training.export.offset = 0
OTHER: Export must be sub-interval of checkpoint
RENAME, FORMAT run.export_callback_script:str -> training.export.callback.script:list[str]|None
RENAME, FORMAT run.export_callback_env:str -> training.export.callback.environment:dict
RENAME run.stop_interval -> training.shutdown.interval
RENAME run.stop_offset -> training.shutdown.offset
OTHER: Shutdown must be sub-interval of checkpoint
RENAME run.wandb_group_name -> training.wandb.group_name
RENAME run.wandb_project_name -> training.wandb.project_name
RENAME run.wandb_entity_name -> training.wandb.entity_name
RENAME run.wandb_status_interval -> training.wandb.alert.interval
ADD training.wandb.alert.offset = 0
RENAME run.wandb_post_alerts -> training.wandb.alert.status_updates
OTHER: Alerts must be sub-interval of logs
OTHER: Extract wandb into separate class
RENAME run.save_tensor_logs -> training.tensor_logs.save
RENAME run.show_tensor_logs -> training.tensor_logs.show
RENAME run.tensor_logs_show_elements -> training.tensor_logs.max_elements
RENAME pretrained.pretrained_checkpoint_path -> pretrained.path
RENAME pretrained.pretrained_checkpoint_type -> pretrained.format
RENAME pretrained.imported_model_type -> pretrained.imported_type
REMOVE pretrained.use_pretrained_config
RENAME pretrained.ignore_pretrained_config -> pretrained.override_architecture
RENAME pretrained.load_pretrained_weights -> pretrained.load_weights
RENAME pretrained.load_pretrained_optimizer -> pretrained.load_optimizer
RENAME data.tokenizer.tokenizer_type -> data.tokenizer.format
RENAME data.tokenizer.tokenizer_file -> data.tokenizer.path
RENAME data.fim.fim_[...] -> data.fim.[...]
REMOVE data.dataset_type
RENAME data.dataset_source -> data.format
RENAME data.data_path -> data.path
RENAME profile.profile_column_width -> profile.table_width
RENAME profile.profile_[...] -> profile.[...]
RENAME optimizer.lr_schedule.lr -> optimizer.learning_rate.base
RENAME optimizer.lr_schedule.lr_decay_style -> optimizer.learning_rate.decay_style
RENAME optimizer.lr_schedule.lr_decay_iters -> optimizer.learning_rate.decay_iterations
RENAME optimizer.lr_schedule.lr_decay_power -> optimizer.learning_rate.decay_power
RENAME optimizer.lr_schedule.lr_warmup_iters -> optimizer.learning_rate.warmup_iterations
RENAME optimizer.lr_schedule.min_lr -> optimizer.learning_rate.minimum
RENAME optimizer.lr_schedule.lr_schedule -> optimizer.learning_rate.schedule
RENAME optimizer.adam_beta1 -> optimizer.beta_1
RENAME optimizer.adam_beta2 -> optimizer.beta_2
RENAME optimizer.adam_eps -> optimizer.epsilon
RENAME optimizer.clip_grad -> optimizer.gradient_norm_clipping
RENAME optimizer.loss_scale -> optimizer.gradient_scaler.constant
RENAME optimizer.initial_loss_scale -> optimizer.gradient_scaler.initial
RENAME optimizer.min_loss_scale -> optimizer.gradient_scaler.minimum
RENAME optimizer.loss_scale_window -> optimizer.gradient_scaler.window
RENAME optimizer.hysteresis -> optimizer.gradient_scaler.hysteresis
RENAME optimizer.default_lr_scale -> optimizer.default_learning_rate_scale
REMOVE optimizer.lr_schedule_offset
!!!RENAME model.base_model.transformer.normalization.normalization_type -> model.base_model.transformer.normalization.type
!!!RENAME model.base_model.transformer.normalization.layer_norm_eps -> model.base_model.transformer.normalization.epsilon
!!!RENAME model.base_model.transformer.normalization.zero_centered_normalization -> model.base_model.transformer.normalization.zero_centered
RENAME model.base_model.transformer.normalization.normalization_implementation -> model.base_model.transformer.normalization.implementation
RENAME model.base_model.transformer.normalization.layer_norm_init_range -> model.base_model.transformer.normalization.initialization_range