Configuration options
General configuration
In the code snippet above the function generate_config is shown without any specific parameters. In practive, the function offers many parameters that define multiple characteristics of the architecture of the two-branch neural network, aspects of training, validating, testing etc. The following section can be used as a cheatsheet for users, explaining the meaning and rationale of every parameter.
Parameter name |
Description |
|---|---|
training |
|
num_epochs |
The max number of epochs allowed for training |
learning_rate |
The learning rate used to determine the step size at each iteration of the optimization process |
decay |
The weight decay (L2 penalty) used by the Adam optimizer |
compute_mode |
The device that is going to be used to actually train the neural network. The valid options are cpu if the user wants to train slowly or cuda:id if the user wants to train on the id gpu of the system |
num_workers |
The number of sub-processes to use for data loading. Larger values usually improve performance but after a point training speed will become worse |
train_batchsize |
The number of samples that comprise a batch from the training set |
val_batchsize |
The number of samples that comprise a batch from the validation and test sets |
patience |
The number of epochs that the network is allowed to continue training for while observing worse overall performance |
delta |
Minimum change in the monitored quantity to qualify as an improvement |
return_results_per_target |
Whether or not to returne the performance for every target separately |
evaluate_train |
Whether or not to calculate performance metrics over the training set |
evaluate_val |
Whether or not to calculate performance metrics over the validation set |
eval_every_n_epochs |
The interval that indicates when the performance metrics are computed |
use_early_stopping |
Whether or not to use early stopping while training |
Metrics |
|
metrics |
The performance metrics that will be calculated. For classification tasks the available metrics are [‘hamming_loss’, ‘auroc’, ‘f1_score’, ‘aupr’, ‘accuracy’, ‘recall’, ‘precision’] while for regression tasks the available metrics are [‘RMSE’, ‘MSE’, ‘MAE’, ‘R2’, ‘RRMSE’] |
metrics_average |
The averaging strategy that will be used to calculate the metric. The available options are [‘macro’, ‘micro’, ‘instance’] |
metric_to_optimize_early_stopping |
The metric that will be used for tracking by the early stopping routine. The value can be the loss or one of the available performance metrics. |
metric_to_optimize_best_epoch_selection |
The validation metric that will be used to determine the best configuration. The value can be the loss or one of the available performance metrics. |
Printing - Saving - Logging |
|
verbose |
Whether or not to print useful in the terminal |
use_tensorboard_logger |
Whether or not to log results in files that Tensoboard can read and visualize |
wandb_project_name |
Defines the name of the wandb project that the results of an experiment will be logged |
wandb_project_entity |
Defines the user name of the wandb account |
results_path |
Defines the path the all relevant information will be saved to |
experiment_name |
Defines the name of the current experiment. This name will be used to local save and the wandb save |
save_model |
Whether or not to save the model of the epoch with the best validation performance |
General architecture architecture |
|
general_architecture_version |
Enables a specific version of the general neural network architecture. Available options are mlp for the mlp version, dot_product for the dot product version, kronecker: for the kronecker product version. Default value is dot_product |
batch_norm |
The option to use batch normalization between the fully connected layers in the two branches |
dropout_rate |
The amount of dropout used in the layers of the two branches |
Instance branch architecture |
|
instance_branch_architecture |
The type of architecture that will be used in the instance branch. Currently, there are two available options, MLP: a basic fully connected feed-forward neural network is used, CONV a convolutional neural network is used |
instance_branch_input_dim |
The input dimension of the instance branch |
instance_train_transforms |
The Pytorch compatible transforms that can be used on the training samples. Useful when using images with convolutional architectures |
instance_inference_transforms |
The Pytorch compatible transforms that can be used on the validation and test samples. Useful when using images with convolutional architectures |
instance_branch_params |
A dictionary that holds all the hyperparameters needed to configure the architecture present in the instance branch. The include key-value pairs like the following: |
Target branch architecture |
|
target_branch_architecture |
The type of architecture that will be used in the target branch. Currently, there are two available options, MLP: a basic fully connected feed-forward neural network is used, CONV a convolutional neural network is used |
target_branch_input_dim |
The input dimension of the target branch |
target_train_transforms |
The Pytorch compatible transforms that can be used on the validation and test samples. Useful when using images with convolutional architectures |
target_inference_transforms |
The Pytorch compatible transforms that can be used on the validation and test samples. Useful when using images with convolutional architectures |
target_branch_params |
A dictionary that holds all the hyperparameters needed to configure the architecture present in the target branch. |
Combination branch architecture |
|
comb_mlp_nodes_per_layer |
Defines the number of nodes in the combination branch. If list, each element defines the number of nodes in the corresponding layer. If int, the same number of nodes is used ‘comb_mlp_layers’ times. (Only used if general_architecture_version == mlp) |
comb_mlp_layers |
The number of layers in the combination branch. (Only used if general_architecture_version == mlp) |
embedding_size |
The size of the embeddings outputted by the two branches. (Only used if general_architecture_version == dot_product) |
Pretrained models |
|
load_pretrained_model |
Whether or not a pretrained model will be loaded |
pretrained_model_path |
The path to the .pt file with the pretrained model (Only used if load_pretrained_model == True) |
Other |
|
additional_info |
A dictionary that holds all other relevant info. Can be used as log adittional info for an experiment in wandb |
validation_setting |
The validation setting of the specific example |
Instance and target branch hyperparameters
As mentioned before, all hyperparameters needed to define the architecture of the instance or target branch are passed as key-value pairs in the instance_branch_params and target_branch_params.
Key |
Description |
|---|---|
Possible key names currently supported in the instance_branch_params dictionary |
|
instance_branch_nodes_per_layer |
Defines the number of nodes in the MLP version of the instance branch. if list, each element defines the number of nodes in the corresponding layer. If int, the same number of nodes is used instance_branch_layers times |
instance_branch_layers |
The number of layers in the MLP version of the instance branch. (Only used if instance_branch_nodes_per_layer is int) |
instance_branch_conv_architecture |
The type of the convolutional architecture that is used in the instance branch. |
instance_branch_conv_architecture_version |
The version of the specific type of convolutional architecture that is used in the instance branch. |
instance_branch_conv_architecture_dense_layers |
The number of dense layers that are used at the end of the convolutional architecture of the instance branch |
instance_branch_conv_architecture_last_layer_trained |
When using pre-trained architectures, the user can define that last layer that will be frozen during training |
Possible key names currently supported in the instance_branch_params dictionary |
|
target_branch_nodes_per_layer |
Defines the number of nodes in the MLP version of the target branch. if list, each element defines the number of nodes in the corresponding layer. If int, the same number of nodes is used target_branch_layers times |
target_branch_layers |
The number of layers in the MLP version of the target branch. (Only used if target_branch_nodes_per_layer is int) |
target_branch_conv_architecture |
The type of the convolutional architecture that is used in the target branch. |
target_branch_conv_architecture_version |
The version of the specific type of convolutional architecture that is used in the target branch. |
target_branch_conv_architecture_dense_layers |
The number of dense layers that are used at the end of the convolutional architecture of the target branch |
target_branch_conv_architecture_last_layer_trained |
When using pre-trained architectures, the user can define that last layer that will be frozen during training |
Example of a generating a configuration:
config = generate_config(
instance_branch_input_dim = data_info['instance_branch_input_dim'],
target_branch_input_dim = data_info['target_branch_input_dim'],
validation_setting = data_info['detected_validation_setting'],
general_architecture_version = 'dot_product',
problem_mode = data_info['detected_problem_mode'],
learning_rate = 0.001,
decay = 0,
batch_norm = False,
dropout_rate = 0,
momentum = 0.9,
weighted_loss = False,
compute_mode = 'cuda:0',
train_batchsize = 1024,
val_batchsize = 1024,
num_epochs = 200,
num_workers = 8,
metrics = ['RMSE', 'MSE'],
metrics_average = ['macro', 'micro'],
patience = 10,
evaluate_train = True,
evaluate_val = True,
verbose = False,
results_verbose = False,
use_early_stopping = True,
use_tensorboard_logger = True,
wandb_project_name = 'Dummy_project_1',
wandb_project_entity = None,
metric_to_optimize_early_stopping = 'loss',
delta=0.01,
metric_to_optimize_best_epoch_selection = 'loss',
instance_branch_architecture = 'MLP',
use_instance_features = True,
instance_branch_params = {
'instance_branch_nodes_reducing_factor': 2,
'instance_branch_nodes_per_layer': [100, 100],
'instance_branch_layers': None,
# 'instance_branch_conv_architecture': 'resnet',
# 'instance_branch_conv_architecture_version': 'resnet101',
# 'instance_branch_conv_architecture_dense_layers': 1,
# 'instance_branch_conv_architecture_last_layer_trained': 'last',
},
target_branch_architecture = 'MLP',
use_target_features = True,
target_branch_params = {
'target_branch_nodes_reducing_factor': 2,
'target_branch_nodes_per_layer': [100, 100],
'target_branch_layers': None,
# 'target_branch_conv_architecture': 'resnet',
# 'target_branch_conv_architecture_version': 'resnet101',
# 'target_branch_conv_architecture_dense_layers': 1,
# 'target_branch_conv_architecture_last_layer_trained': 'last',
},
embedding_size = 100,
comb_mlp_nodes_reducing_factor = 2,
comb_mlp_nodes_per_layer = [2048, 2048, 2048],
comb_mlp_layers = None,
save_model = True,
eval_every_n_epochs = 1,
additional_info = {})