antipasti.utils package
Submodules
antipasti.utils.biology_utils module
- antipasti.utils.biology_utils.antibody_sequence_identity(seq1, seq2)[source]
Computes the percentage of sequence identity.
- antipasti.utils.biology_utils.antigen_identity(seq1, seq2)[source]
Tests whether two antibodies are bound to the same antigen.
- antipasti.utils.biology_utils.check_train_test_identity(training_set_ids, test_set_ids, max_res_list_h=None, max_res_list_l=None, threshold=0.9, residues_path='../data/lists_of_residues/', verbose=False)[source]
Tests the sequence identity of the training and test sets.
- Parameters:
training_set_ids (list) – Contains the PDB identifiers of the training set elements.
test_set_ids (list) – Contains the PDB identifiers of the test set elements.
max_res_list_h (list) – Heavy chain residues of all data.
max_res_list_l (list) – Light chain residues of all data.
threshold (float) – Highest accepted sequence identity value.
residues_path (str) – Path to the folder containing the list of residues per entry.
- antipasti.utils.biology_utils.extract_mean_region_lengths(pdb_codes, data_path='../data/')[source]
Retrieves the FR and CDR lengths of an antibody.
- antipasti.utils.biology_utils.get_sequence(list_of_residues, max_res_list_h=None, max_res_list_l=None)[source]
Returns an amino acid sequence from an ANTIPASTI list of residues. It contains gaps for the antibody.
- antipasti.utils.biology_utils.remove_nanobodies(pdb_codes, representations, embedding=None, labels=[], numerical_values=None)[source]
Returns PDB codes and embeddings without the presence of nanobodies.
- Parameters:
pdb_codes (list) – The PDB codes of the antibodies.
representations (numpy.ndarray) – Normal mode correlation maps (or transformed maps) from which it can be inferred whether a given antibody is a nanobody.
embedding (numpy.ndarray) – Low-dimensional version of
representations
.labels (list) – Data point labels.
numerical_values (list) – If data is numerical (e.g., affinity values), it is necessary to include a list here. In this way, values associated to nanobodies can be removed.
antipasti.utils.explaining_utils module
- antipasti.utils.explaining_utils.add_region_based_on_range(list_residues)[source]
Given a list of residues in Chothia numbering, this function adds the corresponding regions in brackets for each of them.
- antipasti.utils.explaining_utils.compute_region_importance(preprocessed_data, model, type_of_antigen, nanobodies, mode='region', interactive=False)[source]
Computes the importance factors (0-100) of all the Fv antibody regions. Returns the importance for each region.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.type_of_antigen (int) – Choose between: proteins (0), haptens (1), peptides (2) or carbohydrates (3).
nanobodies (list) – PDB codes of nanobodies in the dataset.
mode (str) –
region
to explicitely calculate which correlations are inter/intra-region (likewise forchain
).interactive (bool) – Set to
True
when running a script orpytest
.
- antipasti.utils.explaining_utils.compute_residue_importance(preprocessed_data, model, type_of_antigen, nanobodies, interactive=False)[source]
Computes the importance factors (0-100) of all the amino acids of the antibody variable region.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.type_of_antigen (int) – Choose between: proteins (0), haptens (1), peptides (2) or carbohydrates (3).
nanobodies (list) – PDB codes of nanobodies in the dataset.
interactive (bool) – Set to
True
when running a script orpytest
.
- antipasti.utils.explaining_utils.compute_umap(preprocessed_data, model, scheme='heavy_species', categorical=True, include_ellipses=False, numerical_values=None, external_cdict=None, interactive=False, exclude_nanobodies=False)[source]
Performs UMAP dimensionality reduction calculations.
- Parameters:
preprocessed_data (antipasti.model.model.Preprocessing) – The
Preprocessing
class.model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.scheme (str) – Category of the labels or values appearing in the UMAP representation.
categorical (bool) –
True
ifscheme
is categorical.include_ellipses (bool) –
True
if ellipses comprising three quarters of the points of a given class are included.numerical_values (list) – A list of values or entries should be provided if data external to SAbDab is used.
external_cdict (dictionary) – Option to provide an external dictionary of the UMAP labels.
interactive (bool) – Set to
True
when running a script orpytest
.exclude_nanobodies (bool) – Set to
True
to exclude nanobodies from the UMAP plot.
- antipasti.utils.explaining_utils.get_colours_ag_type(preprocessed_data)[source]
Returns a different colour according to the antigen type.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.
- antipasti.utils.explaining_utils.get_maps_of_interest(preprocessed_data, learnt_filter, affinity_thr=-8)[source]
Post-processes both raw data and results to obtain maps of interest.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.learnt_filter (numpy.ndarray) – Filters that express the learnt features during training.
affinity_thr (float) – Affinity value separating antibodies considered to have high affinity from those considered to have low affinity.
- Returns:
mean_learnt (numpy.ndarray) – A resized version of
learnt_filter
to match the shape of the input normal mode correlation maps.mean_image (numpy.ndarray) – The mean of all the input normal mode correlation maps.
mean_diff_image (numpy.ndarray) – Map resulting from the subtraction of the mean of the high affinity correlation maps and the mean of the low affinity correlation maps.
- antipasti.utils.explaining_utils.get_output_representations(preprocessed_data, model)[source]
Returns maps that reveal the important residue interactions for the binding affinity. We call them ‘output layer representations’.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.
- antipasti.utils.explaining_utils.get_test_contribution(preprocessed_data, model)[source]
Returns a map that reveals the important residue interactions for the binding affinity.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.
- antipasti.utils.explaining_utils.plot_map_with_regions(preprocessed_data, map, title='Normal mode correlation map', interactive=False)[source]
Maps the residues to the antibody regions and plots the normal mode correlation map.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.map (numpy.ndarray) – A normal mode correlation map.
title (str) – The image title.
interactive (bool) – Set to
True
when running a script orpytest
.
- antipasti.utils.explaining_utils.plot_region_importance(importance_factor, importance_factor_ob, antigen_type, mode='region', interactive=False)[source]
Plots ranking of important regions.
- Parameters:
importance_factor (list) – Measure of importance (0-100) for each antibody region.
importance_factor_ob (list) – Measure of importance (0-100) for each antibody region attributable to off-block correlations. This can be inter-region or inter-chain depending on the selected
mode
.antigen_type (int) – Plot corresponding to antigens of a given type. These can be proteins (0), haptens (1), peptides (2) or carbohydrates (3).
mode (str) –
region
to explicitely show which correlations are inter/intra-region (likewise forchain
).interactive (bool) – Set to
True
when running a script orpytest
.
- antipasti.utils.explaining_utils.plot_residue_importance(preprocessed_data, importance_factor, antigen_type, interactive=False)[source]
Plots ranking of important residues.
- Parameters:
preprocessed_data (antipasti.preprocessing.preprocessing.Preprocessing) – The
Preprocessing
class.importance_factor (list) – Measure of importance (0-100) for each antibody residue.
antigen_type (int) – Plot corresponding to antigens of a given type. These can be proteins (0), haptens (1), peptides (2) or carbohydrates (3).
interactive (bool) – Set to
True
when running a script orpytest
.
- antipasti.utils.explaining_utils.plot_umap(embedding, colours, scheme, pdb_codes, categorical=True, include_ellipses=False, cdict=None, interactive=False)[source]
Plots UMAP maps.
- Parameters:
embedding (numpy.ndarray) – The output layer representations after dimensionality reduction.
colours (list) – The data points labels or values.
scheme (str) – Category of the labels or values appearing in the UMAP representation.
pdb_codes (list) – The PDB codes of the antibodies.
categorical (bool) –
True
ifscheme
is categorical.include_ellipses (bool) –
True
to include ellipses comprising 85% of the points of a given class.cdict (dictionary) – External dictionary of the UMAP labels.
interactive (bool) – Set to
True
when running a script orpytest
.
antipasti.utils.generic_utils module
antipasti.utils.torch_utils module
- antipasti.utils.torch_utils.create_test_set(preprocessed_data, test_size=None, random_state=0, residues_path='../data/lists_of_residues/')[source]
Creates the test set given a set of input images and their corresponding labels.
- Parameters:
- Returns:
train_x (torch.Tensor) – Training inputs.
test_x (torch.Tensor) – Test inputs.
train_y (torch.Tensor) – Training labels.
test_y (torch.Tensor) – Test labels.
- antipasti.utils.torch_utils.load_checkpoint(path, input_shape, n_filters=None, pooling_size=None, filter_size=None)[source]
Loads a checkpoint from the
checkpoints
folder.- Parameters:
- Returns:
model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
n_epochs (int) – Number of times the whole dataset went through the model.
train_losses (list) – The history of training losses after the training routine.
test_losses (list) – The history of test losses after the training routine.
- antipasti.utils.torch_utils.save_checkpoint(path, model, optimiser, train_losses, test_losses)[source]
Saves a checkpoint in the
checkpoints
folder.- Parameters:
path (str) – Checkpoint path.
model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
train_losses (list) – The history of training losses after the training routine.
test_losses (list) – The history of test losses after the training routine.
- antipasti.utils.torch_utils.training_routine(model, criterion, optimiser, train_x, test_x, train_y, test_y, n_max_epochs=120, max_corr=0.87, batch_size=32, verbose=True)[source]
Performs a chosen number of training steps.
- Parameters:
model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.criterion (torch.nn.modules.loss.MSELoss) – It calculates a gradient according to a selected loss function, i.e.,
MSELoss
.optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
train_x (torch.Tensor) – Training normal mode correlation maps.
test_x (torch.Tensor) – Test normal mode correlation maps.
train_y (torch.Tensor) – Training labels.
test_y (torch.Tensor) – Test labels.
n_max_epochs (int) – Number of times the whole dataset goes through the model.
max_corr (float) – If the correlation coefficient exceeds this value, the training routine is terminated.
batch_size (int) – Number of samples that pass through the model before its parameters are updated.
verbose (bool) –
True
to print the losses in each epoch.
- Returns:
train_losses (list) – The history of training losses after the training routine.
test_losses (list) – The history of test losses after the training routine.
inter_filter (torch.Tensor) – Filters before the fully-connected layer.
y_test (torch.Tensor) – Ground truth test labels.
output_test (torch.Tensor) – The predicted test labels.
- antipasti.utils.torch_utils.training_step(model, criterion, optimiser, train_x, test_x, train_y, test_y, train_losses, test_losses, epoch, batch_size, verbose)[source]
Performs a training step.
- Parameters:
model (antipasti.model.model.ANTIPASTI) – The model class, i.e.,
ANTIPASTI
.criterion (torch.nn.modules.loss.MSELoss) – It calculates a gradient according to a selected loss function, i.e.,
MSELoss
.optimiser (adabelief_pytorch.AdaBelief.AdaBelief) – Method that implements an optimisation algorithm.
train_x (torch.Tensor) – Training normal mode correlation maps.
test_x (torch.Tensor) – Test normal mode correlation maps.
train_y (torch.Tensor) – Training labels.
test_y (torch.Tensor) – Test labels.
train_losses (list) – The current history of training losses.
test_losses (list) – The current history of test losses.
epoch (int) – Of value
e
if the dataset has gone through the modele
times.batch_size (int) – Number of samples that pass through the model before its parameters are updated.
verbose (bool) –
True
to print the losses in each epoch.
- Returns:
train_losses (list) – The history of training losses after the training step.
test_losses (list) – The history of test losses after the training step.
inter_filter (torch.Tensor) – Filters before the fully-connected layer.
y_test (torch.Tensor) – Ground truth test labels.
output_test (torch.Tensor) – The predicted test labels.
Module contents
This subpackage contains utility functions.