antipasti.utils package

Submodules

antipasti.utils.biology_utils module

antipasti.utils.biology_utils.antibody_sequence_identity(seq1, seq2)[source]

Computes the percentage of sequence identity.

Parameters:

seq1 (str) – First sequence.
seq2 (str) – Second sequence.

antipasti.utils.biology_utils.antigen_identity(seq1, seq2)[source]

Tests whether two antibodies are bound to the same antigen.

Parameters:

seq1 (str) – First sequence.
seq2 (str) – Second sequence.

antipasti.utils.biology_utils.check_train_test_identity(training_set_ids, test_set_ids, max_res_list_h=None, max_res_list_l=None, threshold=0.9, residues_path='../data/lists_of_residues/', verbose=False)[source]

Tests the sequence identity of the training and test sets.

Parameters:

training_set_ids (list) – Contains the PDB identifiers of the training set elements.
test_set_ids (list) – Contains the PDB identifiers of the test set elements.
max_res_list_h (list) – Heavy chain residues of all data.
max_res_list_l (list) – Light chain residues of all data.
threshold (float) – Highest accepted sequence identity value.
residues_path (str) – Path to the folder containing the list of residues per entry.

antipasti.utils.biology_utils.extract_mean_region_lengths(pdb_codes, data_path='../data/')[source]

Retrieves the FR and CDR lengths of an antibody.

Parameters:

pdb_code (str) – The antibody PDB code.
data_path (str) – Path to the data folder.

antipasti.utils.biology_utils.get_sequence(list_of_residues, max_res_list_h=None, max_res_list_l=None)[source]

Returns an amino acid sequence from an ANTIPASTI list of residues. It contains gaps for the antibody.

Parameters:

list_of_residues (list) – Residues numbered according to the Chothia scheme with presence of ‘START-Ab’ and ‘END-Ab’ labels.
max_res_list_h (list) – Heavy chain residues of all data.
max_res_list_l (list) – Light chain residues of all data.

antipasti.utils.biology_utils.remove_nanobodies(pdb_codes, representations, embedding=None, labels=[], numerical_values=None)[source]

Returns PDB codes and embeddings without the presence of nanobodies.

Parameters:

pdb_codes (list) – The PDB codes of the antibodies.
representations (numpy.ndarray) – Normal mode correlation maps (or transformed maps) from which it can be inferred whether a given antibody is a nanobody.
embedding (numpy.ndarray) – Low-dimensional version of representations.
labels (list) – Data point labels.
numerical_values (list) – If data is numerical (e.g., affinity values), it is necessary to include a list here. In this way, values associated to nanobodies can be removed.

antipasti.utils.explaining_utils module

antipasti.utils.explaining_utils.add_region_based_on_range(list_residues)[source]: Given a list of residues in Chothia numbering, this function adds the corresponding regions in brackets for each of them.

antipasti.utils.explaining_utils.compute_region_importance(preprocessed_data, model, type_of_antigen, nanobodies, mode='region', interactive=False)[source]

Computes the importance factors (0-100) of all the Fv antibody regions. Returns the importance for each region.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.
model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.
type_of_antigen (int) – Choose between: proteins (0), haptens (1), peptides (2) or carbohydrates (3).
nanobodies (list) – PDB codes of nanobodies in the dataset.
mode (str) – region to explicitely calculate which correlations are inter/intra-region (likewise for chain).
interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.compute_residue_importance(preprocessed_data, model, type_of_antigen, nanobodies, interactive=False)[source]

Computes the importance factors (0-100) of all the amino acids of the antibody variable region.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.
model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.
type_of_antigen (int) – Choose between: proteins (0), haptens (1), peptides (2) or carbohydrates (3).
nanobodies (list) – PDB codes of nanobodies in the dataset.
interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.compute_umap(preprocessed_data, model, scheme='heavy_species', categorical=True, include_ellipses=False, numerical_values=None, external_cdict=None, interactive=False, exclude_nanobodies=False)[source]

Performs UMAP dimensionality reduction calculations.

Parameters:

preprocessed_data (antipasti.model.model.Preprocessing) – The Preprocessing class.
model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.
scheme (str) – Category of the labels or values appearing in the UMAP representation.
categorical (bool) – True if scheme is categorical.
include_ellipses (bool) – True if ellipses comprising three quarters of the points of a given class are included.
numerical_values (list) – A list of values or entries should be provided if data external to SAbDab is used.
external_cdict (dictionary) – Option to provide an external dictionary of the UMAP labels.
interactive (bool) – Set to True when running a script or pytest.
exclude_nanobodies (bool) – Set to True to exclude nanobodies from the UMAP plot.

antipasti.utils.explaining_utils.get_colours_ag_type(preprocessed_data)[source]

Returns a different colour according to the antigen type.

Parameters:: preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.

antipasti.utils.explaining_utils.get_maps_of_interest(preprocessed_data, learnt_filter, affinity_thr=-8)[source]

Post-processes both raw data and results to obtain maps of interest.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.
learnt_filter (numpy.ndarray) – Filters that express the learnt features during training.
affinity_thr (float) – Affinity value separating antibodies considered to have high affinity from those considered to have low affinity.

Returns:

mean_learnt (numpy.ndarray) – A resized version of learnt_filter to match the shape of the input normal mode correlation maps.
mean_image (numpy.ndarray) – The mean of all the input normal mode correlation maps.
mean_diff_image (numpy.ndarray) – Map resulting from the subtraction of the mean of the high affinity correlation maps and the mean of the low affinity correlation maps.

antipasti.utils.explaining_utils.get_output_representations(preprocessed_data, model)[source]

Returns maps that reveal the important residue interactions for the binding affinity. We call them ‘output layer representations’.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.
model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

antipasti.utils.explaining_utils.get_test_contribution(preprocessed_data, model)[source]

Returns a map that reveals the important residue interactions for the binding affinity.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.
model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.

antipasti.utils.explaining_utils.plot_map_with_regions(preprocessed_data, map, title='Normal mode correlation map', interactive=False)[source]

Maps the residues to the antibody regions and plots the normal mode correlation map.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.
map (numpy.ndarray) – A normal mode correlation map.
title (str) – The image title.
interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.plot_region_importance(importance_factor, importance_factor_ob, antigen_type, mode='region', interactive=False)[source]

Plots ranking of important regions.

Parameters:

importance_factor (list) – Measure of importance (0-100) for each antibody region.
importance_factor_ob (list) – Measure of importance (0-100) for each antibody region attributable to off-block correlations. This can be inter-region or inter-chain depending on the selected mode.
antigen_type (int) – Plot corresponding to antigens of a given type. These can be proteins (0), haptens (1), peptides (2) or carbohydrates (3).
mode (str) – region to explicitely show which correlations are inter/intra-region (likewise for chain).
interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.plot_residue_importance(preprocessed_data, importance_factor, antigen_type, interactive=False)[source]

Plots ranking of important residues.

Parameters:

preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The Preprocessing class.
importance_factor (list) – Measure of importance (0-100) for each antibody residue.
antigen_type (int) – Plot corresponding to antigens of a given type. These can be proteins (0), haptens (1), peptides (2) or carbohydrates (3).
interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.explaining_utils.plot_umap(embedding, colours, scheme, pdb_codes, categorical=True, include_ellipses=False, cdict=None, interactive=False)[source]

Plots UMAP maps.

Parameters:

embedding (numpy.ndarray) – The output layer representations after dimensionality reduction.
colours (list) – The data points labels or values.
scheme (str) – Category of the labels or values appearing in the UMAP representation.
pdb_codes (list) – The PDB codes of the antibodies.
categorical (bool) – True if scheme is categorical.
include_ellipses (bool) – True to include ellipses comprising 85% of the points of a given class.
cdict (dictionary) – External dictionary of the UMAP labels.
interactive (bool) – Set to True when running a script or pytest.

antipasti.utils.generic_utils module

antipasti.utils.generic_utils.remove_abc(residue)[source]: Returns the residue names without the final letter that indicates extension positions.

antipasti.utils.torch_utils module

antipasti.utils.torch_utils.create_test_set(preprocessed_data, test_size=None, random_state=0, residues_path='../data/lists_of_residues/')[source]

Creates the test set given a set of input images and their corresponding labels.

Parameters:

preprocessed_data (antipasti.preprocessing.Preprocessed) – An instance of the Preprocessed class.
test_size (float) – Fraction of original samples to be included in the test set.
random_state (int) – Set lot number.
residues_path (str) – Path to the folder containing the list of residues per entry.

Returns:

train_x (torch.Tensor) – Training inputs.
test_x (torch.Tensor) – Test inputs.
train_y (torch.Tensor) – Training labels.
test_y (torch.Tensor) – Test labels.

antipasti.utils.torch_utils.load_checkpoint(path, input_shape, n_filters=None, pooling_size=None, filter_size=None)[source]

Loads a checkpoint from the checkpoints folder.

Parameters:

path (str) – Checkpoint path.
input_shape (int) – Shape of the normal mode correlation maps.
n_filters (int) – Number of filters in the convolutional layer.
pooling_size (int) – Size of the max pooling operation.
filter_size (int) – Size of filters in the convolutional layer.

Returns:

model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.
optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
n_epochs (int) – Number of times the whole dataset went through the model.
train_losses (list) – The history of training losses after the training routine.
test_losses (list) – The history of test losses after the training routine.

antipasti.utils.torch_utils.save_checkpoint(path, model, optimiser, train_losses, test_losses)[source]

Saves a checkpoint in the checkpoints folder.

Parameters:

path (str) – Checkpoint path.
model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.
optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
train_losses (list) – The history of training losses after the training routine.
test_losses (list) – The history of test losses after the training routine.

antipasti.utils.torch_utils.training_routine(model, criterion, optimiser, train_x, test_x, train_y, test_y, n_max_epochs=120, max_corr=0.87, batch_size=32, verbose=True)[source]

Performs a chosen number of training steps.

Parameters:

model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.
criterion (torch.nn.modules.loss.MSELoss) – It calculates a gradient according to a selected loss function, i.e., MSELoss.
optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
train_x (torch.Tensor) – Training normal mode correlation maps.
test_x (torch.Tensor) – Test normal mode correlation maps.
train_y (torch.Tensor) – Training labels.
test_y (torch.Tensor) – Test labels.
n_max_epochs (int) – Number of times the whole dataset goes through the model.
max_corr (float) – If the correlation coefficient exceeds this value, the training routine is terminated.
batch_size (int) – Number of samples that pass through the model before its parameters are updated.
verbose (bool) – True to print the losses in each epoch.

Returns:

train_losses (list) – The history of training losses after the training routine.
test_losses (list) – The history of test losses after the training routine.
inter_filter (torch.Tensor) – Filters before the fully-connected layer.
y_test (torch.Tensor) – Ground truth test labels.
output_test (torch.Tensor) – The predicted test labels.

antipasti.utils.torch_utils.training_step(model, criterion, optimiser, train_x, test_x, train_y, test_y, train_losses, test_losses, epoch, batch_size, verbose)[source]

Performs a training step.

Parameters:

model (antipasti.model.model.ANTIPASTI) – The model class, i.e., ANTIPASTI.
criterion (torch.nn.modules.loss.MSELoss) – It calculates a gradient according to a selected loss function, i.e., MSELoss.
optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
train_x (torch.Tensor) – Training normal mode correlation maps.
test_x (torch.Tensor) – Test normal mode correlation maps.
train_y (torch.Tensor) – Training labels.
test_y (torch.Tensor) – Test labels.
train_losses (list) – The current history of training losses.
test_losses (list) – The current history of test losses.
epoch (int) – Of value e if the dataset has gone through the model e times.
batch_size (int) – Number of samples that pass through the model before its parameters are updated.
verbose (bool) – True to print the losses in each epoch.

Returns:

train_losses (list) – The history of training losses after the training step.
test_losses (list) – The history of test losses after the training step.
inter_filter (torch.Tensor) – Filters before the fully-connected layer.
y_test (torch.Tensor) – Ground truth test labels.
output_test (torch.Tensor) – The predicted test labels.

Module contents

This subpackage contains utility functions.