antipasti.utils package

Submodules

antipasti.utils.biology_utils module

antipasti.utils.biology_utils.antibody_sequence_identity(seq1, seq2)[source]

Computes the percentage of sequence identity.

Parameters:
  • seq1 (str) – First sequence.

  • seq2 (str) – Second sequence.

antipasti.utils.biology_utils.antigen_identity(seq1, seq2)[source]

Tests whether two antibodies are bound to the same antigen.

Parameters:
  • seq1 (str) – First sequence.

  • seq2 (str) – Second sequence.

antipasti.utils.biology_utils.check_train_test_identity(training_set_ids, test_set_ids, max_res_list_h=None, max_res_list_l=None, threshold=0.9, residues_path='../data/lists_of_residues/', verbose=False)[source]

Tests the sequence identity of the training and test sets.

Parameters:
  • training_set_ids (list) – Contains the PDB identifiers of the training set elements.

  • test_set_ids (list) – Contains the PDB identifiers of the test set elements.

  • max_res_list_h (list) – Heavy chain residues of all data.

  • max_res_list_l (list) – Light chain residues of all data.

  • threshold (float) – Highest accepted sequence identity value.

  • residues_path (str) – Path to the folder containing the list of residues per entry.

antipasti.utils.biology_utils.extract_mean_region_lengths(pdb_codes, data_path='../data/')[source]

Retrieves the FR and CDR lengths of an antibody.

Parameters:
  • pdb_code (str) – The antibody PDB code.

  • data_path (str) – Path to the data folder.

antipasti.utils.biology_utils.get_sequence(list_of_residues, max_res_list_h=None, max_res_list_l=None)[source]

Returns an amino acid sequence from an ANTIPASTI list of residues. It contains gaps for the antibody.

Parameters:
  • list_of_residues (list) – Residues numbered according to the Chothia scheme with presence of ‘START-Ab’ and ‘END-Ab’ labels.

  • max_res_list_h (list) – Heavy chain residues of all data.

  • max_res_list_l (list) – Light chain residues of all data.

antipasti.utils.biology_utils.remove_nanobodies(pdb_codes, representations, embedding=None, labels=[], numerical_values=None)[source]

Returns PDB codes and embeddings without the presence of nanobodies.

Parameters:
  • pdb_codes (list) – The PDB codes of the antibodies.

  • representations (numpy.ndarray) – Normal mode correlation maps (or transformed maps) from which it can be inferred whether a given antibody is a nanobody.

  • embedding (numpy.ndarray) – Low-dimensional version of representations.

  • labels (list) – Data point labels.

  • numerical_values (list) – If data is numerical (e.g., affinity values), it is necessary to include a list here. In this way, values associated to nanobodies can be removed.

antipasti.utils.explaining_utils module

antipasti.utils.explaining_utils.add_region_based_on_range(list_residues)[source]

Given a list of residues in Chothia numbering, this function adds the corresponding regions in brackets for each of them.

antipasti.utils.explaining_utils.compute_region_importance(preprocessed_data, model, type_of_antigen, nanobodies, mode='region', interactive=False)[source]

Computes the importance factors (0-100) of all the Fv antibody regions. Returns the importance for each region.

Parameters:
  • preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.

  • model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

  • type_of_antigen (int) – Choose between: proteins (0), haptens (1), peptides (2) or carbohydrates (3).

  • nanobodies (list) – PDB codes of nanobodies in the dataset.

  • mode (str) – region to explicitely calculate which correlations are inter/intra-region (likewise for chain).

  • interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.compute_residue_importance(preprocessed_data, model, type_of_antigen, nanobodies, interactive=False)[source]

Computes the importance factors (0-100) of all the amino acids of the antibody variable region.

Parameters:
antipasti.utils.explaining_utils.compute_umap(preprocessed_data, model, scheme='heavy_species', categorical=True, include_ellipses=False, numerical_values=None, external_cdict=None, interactive=False, exclude_nanobodies=False)[source]

Performs UMAP dimensionality reduction calculations.

Parameters:
  • preprocessed_data (antipasti.model.model.Preprocessing) – The Preprocessing class.

  • model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

  • scheme (str) – Category of the labels or values appearing in the UMAP representation.

  • categorical (bool) – True if scheme is categorical.

  • include_ellipses (bool) – True if ellipses comprising three quarters of the points of a given class are included.

  • numerical_values (list) – A list of values or entries should be provided if data external to SAbDab is used.

  • external_cdict (dictionary) – Option to provide an external dictionary of the UMAP labels.

  • interactive (bool) – Set to True when running a script or pytest.

  • exclude_nanobodies (bool) – Set to True to exclude nanobodies from the UMAP plot.

antipasti.utils.explaining_utils.get_colours_ag_type(preprocessed_data)[source]

Returns a different colour according to the antigen type.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.

antipasti.utils.explaining_utils.get_maps_of_interest(preprocessed_data, learnt_filter, affinity_thr=-8)[source]

Post-processes both raw data and results to obtain maps of interest.

Parameters:
  • preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.

  • learnt_filter (numpy.ndarray) – Filters that express the learnt features during training.

  • affinity_thr (float) – Affinity value separating antibodies considered to have high affinity from those considered to have low affinity.

Returns:

  • mean_learnt (numpy.ndarray) – A resized version of learnt_filter to match the shape of the input normal mode correlation maps.

  • mean_image (numpy.ndarray) – The mean of all the input normal mode correlation maps.

  • mean_diff_image (numpy.ndarray) – Map resulting from the subtraction of the mean of the high affinity correlation maps and the mean of the low affinity correlation maps.

antipasti.utils.explaining_utils.get_output_representations(preprocessed_data, model)[source]

Returns maps that reveal the important residue interactions for the binding affinity. We call them ‘output layer representations’.

Parameters:
antipasti.utils.explaining_utils.get_test_contribution(preprocessed_data, model)[source]

Returns a map that reveals the important residue interactions for the binding affinity.

Parameters:
antipasti.utils.explaining_utils.plot_map_with_regions(preprocessed_data, map, title='Normal mode correlation map', interactive=False)[source]

Maps the residues to the antibody regions and plots the normal mode correlation map.

Parameters:
antipasti.utils.explaining_utils.plot_region_importance(importance_factor, importance_factor_ob, antigen_type, mode='region', interactive=False)[source]

Plots ranking of important regions.

Parameters:
  • importance_factor (list) – Measure of importance (0-100) for each antibody region.

  • importance_factor_ob (list) – Measure of importance (0-100) for each antibody region attributable to off-block correlations. This can be inter-region or inter-chain depending on the selected mode.

  • antigen_type (int) – Plot corresponding to antigens of a given type. These can be proteins (0), haptens (1), peptides (2) or carbohydrates (3).

  • mode (str) – region to explicitely show which correlations are inter/intra-region (likewise for chain).

  • interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.plot_residue_importance(preprocessed_data, importance_factor, antigen_type, interactive=False)[source]

Plots ranking of important residues.

Parameters:
  • preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.

  • importance_factor (list) – Measure of importance (0-100) for each antibody residue.

  • antigen_type (int) – Plot corresponding to antigens of a given type. These can be proteins (0), haptens (1), peptides (2) or carbohydrates (3).

  • interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.plot_umap(embedding, colours, scheme, pdb_codes, categorical=True, include_ellipses=False, cdict=None, interactive=False)[source]

Plots UMAP maps.

Parameters:
  • embedding (numpy.ndarray) – The output layer representations after dimensionality reduction.

  • colours (list) – The data points labels or values.

  • scheme (str) – Category of the labels or values appearing in the UMAP representation.

  • pdb_codes (list) – The PDB codes of the antibodies.

  • categorical (bool) – True if scheme is categorical.

  • include_ellipses (bool) – True to include ellipses comprising 85% of the points of a given class.

  • cdict (dictionary) – External dictionary of the UMAP labels.

  • interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.generic_utils module

antipasti.utils.generic_utils.remove_abc(residue)[source]

Returns the residue names without the final letter that indicates extension positions.

antipasti.utils.torch_utils module

antipasti.utils.torch_utils.create_test_set(preprocessed_data, test_size=None, random_state=0, residues_path='../data/lists_of_residues/')[source]

Creates the test set given a set of input images and their corresponding labels.

Parameters:
  • preprocessed_data (antipasti.preprocessing.Preprocessed) – An instance of the Preprocessed class.

  • test_size (float) – Fraction of original samples to be included in the test set.

  • random_state (int) – Set lot number.

  • residues_path (str) – Path to the folder containing the list of residues per entry.

Returns:

  • train_x (torch.Tensor) – Training inputs.

  • test_x (torch.Tensor) – Test inputs.

  • train_y (torch.Tensor) – Training labels.

  • test_y (torch.Tensor) – Test labels.

antipasti.utils.torch_utils.load_checkpoint(path, input_shape, n_filters=None, pooling_size=None, filter_size=None)[source]

Loads a checkpoint from the checkpoints folder.

Parameters:
  • path (str) – Checkpoint path.

  • input_shape (int) – Shape of the normal mode correlation maps.

  • n_filters (int) – Number of filters in the convolutional layer.

  • pooling_size (int) – Size of the max pooling operation.

  • filter_size (int) – Size of filters in the convolutional layer.

Returns:

  • model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

  • optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.

  • n_epochs (int) – Number of times the whole dataset went through the model.

  • train_losses (list) – The history of training losses after the training routine.

  • test_losses (list) – The history of test losses after the training routine.

antipasti.utils.torch_utils.save_checkpoint(path, model, optimiser, train_losses, test_losses)[source]

Saves a checkpoint in the checkpoints folder.

Parameters:
  • path (str) – Checkpoint path.

  • model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

  • optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.

  • train_losses (list) – The history of training losses after the training routine.

  • test_losses (list) – The history of test losses after the training routine.

antipasti.utils.torch_utils.training_routine(model, criterion, optimiser, train_x, test_x, train_y, test_y, n_max_epochs=120, max_corr=0.87, batch_size=32, verbose=True)[source]

Performs a chosen number of training steps.

Parameters:
  • model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

  • criterion (torch.nn.modules.loss.MSELoss) – It calculates a gradient according to a selected loss function, i.e., MSELoss.

  • optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.

  • train_x (torch.Tensor) – Training normal mode correlation maps.

  • test_x (torch.Tensor) – Test normal mode correlation maps.

  • train_y (torch.Tensor) – Training labels.

  • test_y (torch.Tensor) – Test labels.

  • n_max_epochs (int) – Number of times the whole dataset goes through the model.

  • max_corr (float) – If the correlation coefficient exceeds this value, the training routine is terminated.

  • batch_size (int) – Number of samples that pass through the model before its parameters are updated.

  • verbose (bool) – True to print the losses in each epoch.

Returns:

  • train_losses (list) – The history of training losses after the training routine.

  • test_losses (list) – The history of test losses after the training routine.

  • inter_filter (torch.Tensor) – Filters before the fully-connected layer.

  • y_test (torch.Tensor) – Ground truth test labels.

  • output_test (torch.Tensor) – The predicted test labels.

antipasti.utils.torch_utils.training_step(model, criterion, optimiser, train_x, test_x, train_y, test_y, train_losses, test_losses, epoch, batch_size, verbose)[source]

Performs a training step.

Parameters:
  • model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

  • criterion (torch.nn.modules.loss.MSELoss) – It calculates a gradient according to a selected loss function, i.e., MSELoss.

  • optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.

  • train_x (torch.Tensor) – Training normal mode correlation maps.

  • test_x (torch.Tensor) – Test normal mode correlation maps.

  • train_y (torch.Tensor) – Training labels.

  • test_y (torch.Tensor) – Test labels.

  • train_losses (list) – The current history of training losses.

  • test_losses (list) – The current history of test losses.

  • epoch (int) – Of value e if the dataset has gone through the model e times.

  • batch_size (int) – Number of samples that pass through the model before its parameters are updated.

  • verbose (bool) – True to print the losses in each epoch.

Returns:

  • train_losses (list) – The history of training losses after the training step.

  • test_losses (list) – The history of test losses after the training step.

  • inter_filter (torch.Tensor) – Filters before the fully-connected layer.

  • y_test (torch.Tensor) – Ground truth test labels.

  • output_test (torch.Tensor) – The predicted test labels.

Module contents

This subpackage contains utility functions.