Configuration options

General configuration

In the code snippet above the function generate_config is shown without any specific parameters. In practive, the function offers many parameters that define multiple characteristics of the architecture of the two-branch neural network, aspects of training, validating, testing etc. The following section can be used as a cheatsheet for users, explaining the meaning and rationale of every parameter.

Parameter name	Description
training
num_epochs	The max number of epochs allowed for training
learning_rate	The learning rate used to determine the step size at each iteration of the optimization process
decay	The weight decay (L2 penalty) used by the Adam optimizer
compute_mode	The device that is going to be used to actually train the neural network. The valid options are cpu if the user wants to train slowly or cuda:id if the user wants to train on the id gpu of the system
num_workers	The number of sub-processes to use for data loading. Larger values usually improve performance but after a point training speed will become worse
train_batchsize	The number of samples that comprise a batch from the training set
val_batchsize	The number of samples that comprise a batch from the validation and test sets
patience	The number of epochs that the network is allowed to continue training for while observing worse overall performance
delta	Minimum change in the monitored quantity to qualify as an improvement
return_results_per_target	Whether or not to returne the performance for every target separately
evaluate_train	Whether or not to calculate performance metrics over the training set
evaluate_val	Whether or not to calculate performance metrics over the validation set
eval_every_n_epochs	The interval that indicates when the performance metrics are computed
use_early_stopping	Whether or not to use early stopping while training
Metrics
metrics	The performance metrics that will be calculated. For classification tasks the available metrics are [‘hamming_loss’, ‘auroc’, ‘f1_score’, ‘aupr’, ‘accuracy’, ‘recall’, ‘precision’] while for regression tasks the available metrics are [‘RMSE’, ‘MSE’, ‘MAE’, ‘R2’, ‘RRMSE’]
metrics_average	The averaging strategy that will be used to calculate the metric. The available options are [‘macro’, ‘micro’, ‘instance’]
metric_to_optimize_early_stopping	The metric that will be used for tracking by the early stopping routine. The value can be the loss or one of the available performance metrics.
metric_to_optimize_best_epoch_selection	The validation metric that will be used to determine the best configuration. The value can be the loss or one of the available performance metrics.
Printing - Saving - Logging
verbose	Whether or not to print useful in the terminal
use_tensorboard_logger	Whether or not to log results in files that Tensoboard can read and visualize
wandb_project_name	Defines the name of the wandb project that the results of an experiment will be logged
wandb_project_entity	Defines the user name of the wandb account
results_path	Defines the path the all relevant information will be saved to
experiment_name	Defines the name of the current experiment. This name will be used to local save and the wandb save
save_model	Whether or not to save the model of the epoch with the best validation performance
General architecture architecture
general_architecture_version	Enables a specific version of the general neural network architecture. Available options are mlp for the mlp version, dot_product for the dot product version, kronecker: for the kronecker product version. Default value is dot_product
batch_norm	The option to use batch normalization between the fully connected layers in the two branches
dropout_rate	The amount of dropout used in the layers of the two branches
Instance branch architecture
instance_branch_architecture	The type of architecture that will be used in the instance branch. Currently, there are two available options, MLP: a basic fully connected feed-forward neural network is used, CONV a convolutional neural network is used
instance_branch_input_dim	The input dimension of the instance branch
instance_train_transforms	The Pytorch compatible transforms that can be used on the training samples. Useful when using images with convolutional architectures
instance_inference_transforms	The Pytorch compatible transforms that can be used on the validation and test samples. Useful when using images with convolutional architectures
instance_branch_params	A dictionary that holds all the hyperparameters needed to configure the architecture present in the instance branch. The include key-value pairs like the following:
Target branch architecture
target_branch_architecture	The type of architecture that will be used in the target branch. Currently, there are two available options, MLP: a basic fully connected feed-forward neural network is used, CONV a convolutional neural network is used
target_branch_input_dim	The input dimension of the target branch
target_train_transforms	The Pytorch compatible transforms that can be used on the validation and test samples. Useful when using images with convolutional architectures
target_inference_transforms	The Pytorch compatible transforms that can be used on the validation and test samples. Useful when using images with convolutional architectures
target_branch_params	A dictionary that holds all the hyperparameters needed to configure the architecture present in the target branch.
Combination branch architecture
comb_mlp_nodes_per_layer	Defines the number of nodes in the combination branch. If list, each element defines the number of nodes in the corresponding layer. If int, the same number of nodes is used ‘comb_mlp_layers’ times. (Only used if general_architecture_version == mlp)
comb_mlp_layers	The number of layers in the combination branch. (Only used if general_architecture_version == mlp)
embedding_size	The size of the embeddings outputted by the two branches. (Only used if general_architecture_version == dot_product)
Pretrained models
load_pretrained_model	Whether or not a pretrained model will be loaded
pretrained_model_path	The path to the .pt file with the pretrained model (Only used if load_pretrained_model == True)
Other
additional_info	A dictionary that holds all other relevant info. Can be used as log adittional info for an experiment in wandb
validation_setting	The validation setting of the specific example

Instance and target branch hyperparameters

As mentioned before, all hyperparameters needed to define the architecture of the instance or target branch are passed as key-value pairs in the instance_branch_params and target_branch_params.

Key	Description
Possible key names currently supported in the instance_branch_params dictionary
instance_branch_nodes_per_layer	Defines the number of nodes in the MLP version of the instance branch. if list, each element defines the number of nodes in the corresponding layer. If int, the same number of nodes is used instance_branch_layers times
instance_branch_layers	The number of layers in the MLP version of the instance branch. (Only used if instance_branch_nodes_per_layer is int)
instance_branch_conv_architecture	The type of the convolutional architecture that is used in the instance branch.
instance_branch_conv_architecture_version	The version of the specific type of convolutional architecture that is used in the instance branch.
instance_branch_conv_architecture_dense_layers	The number of dense layers that are used at the end of the convolutional architecture of the instance branch
instance_branch_conv_architecture_last_layer_trained	When using pre-trained architectures, the user can define that last layer that will be frozen during training
Possible key names currently supported in the instance_branch_params dictionary
target_branch_nodes_per_layer	Defines the number of nodes in the MLP version of the target branch. if list, each element defines the number of nodes in the corresponding layer. If int, the same number of nodes is used target_branch_layers times
target_branch_layers	The number of layers in the MLP version of the target branch. (Only used if target_branch_nodes_per_layer is int)
target_branch_conv_architecture	The type of the convolutional architecture that is used in the target branch.
target_branch_conv_architecture_version	The version of the specific type of convolutional architecture that is used in the target branch.
target_branch_conv_architecture_dense_layers	The number of dense layers that are used at the end of the convolutional architecture of the target branch
target_branch_conv_architecture_last_layer_trained	When using pre-trained architectures, the user can define that last layer that will be frozen during training

Example of a generating a configuration:

config = generate_config(
    instance_branch_input_dim = data_info['instance_branch_input_dim'],
    target_branch_input_dim = data_info['target_branch_input_dim'],
    validation_setting = data_info['detected_validation_setting'],
    general_architecture_version = 'dot_product',
    problem_mode = data_info['detected_problem_mode'],
    learning_rate = 0.001,
    decay = 0,
    batch_norm = False,
    dropout_rate = 0,
    momentum = 0.9,
    weighted_loss = False,
    compute_mode = 'cuda:0',
    train_batchsize = 1024,
    val_batchsize = 1024,
    num_epochs = 200,
    num_workers = 8,
    metrics = ['RMSE', 'MSE'],
    metrics_average = ['macro', 'micro'],
    patience = 10,

    evaluate_train = True,
    evaluate_val = True,

    verbose = False,
    results_verbose = False,
    use_early_stopping = True,
    use_tensorboard_logger = True,
    wandb_project_name = 'Dummy_project_1',
    wandb_project_entity = None,
    metric_to_optimize_early_stopping = 'loss',
    delta=0.01,
    metric_to_optimize_best_epoch_selection = 'loss',

    instance_branch_architecture = 'MLP',
    use_instance_features = True,
    instance_branch_params = {
        'instance_branch_nodes_reducing_factor': 2,
        'instance_branch_nodes_per_layer': [100, 100],
        'instance_branch_layers': None,
        # 'instance_branch_conv_architecture': 'resnet',
        # 'instance_branch_conv_architecture_version': 'resnet101',
        # 'instance_branch_conv_architecture_dense_layers': 1,
        # 'instance_branch_conv_architecture_last_layer_trained': 'last',
    },


    target_branch_architecture = 'MLP',
    use_target_features = True,
    target_branch_params = {
        'target_branch_nodes_reducing_factor': 2,
        'target_branch_nodes_per_layer': [100, 100],
        'target_branch_layers': None,
        # 'target_branch_conv_architecture': 'resnet',
        # 'target_branch_conv_architecture_version': 'resnet101',
        # 'target_branch_conv_architecture_dense_layers': 1,
        # 'target_branch_conv_architecture_last_layer_trained': 'last',
    },

    embedding_size = 100,
    comb_mlp_nodes_reducing_factor = 2,
    comb_mlp_nodes_per_layer = [2048, 2048, 2048],
    comb_mlp_layers = None,

    save_model = True,

    eval_every_n_epochs = 1,

    additional_info = {})