Evaluating Alphafold predictions

AF gives quality scores for each prediction. A great FAQ can be found at https://alphafold.ebi.ac.uk/faq.

Another great resource is this YouTube video by EMBL-EBI:

Also, have a look at this lecture by John Jumper, the research lead of AlphaFold:

Here are just some examples from our HSV-1 predictions:

The first measure is a depiction of the Multiple Sequence Alignment (MSA) that is used as input for the network.

Sequence coverage HSV-1 UL55 MSA

HSV-1 UL55 MSA

The MSA above from HSV-1 UL55 shows ok coverage of both similar and less similar sequences as well as good coverage for the C-terminus also with less similar sequences.

Sequence coverage HSV-1 UL56 MSA

HSV-1 UL56 MSA

Compare the MSA of UL56. It is much less well populated and it does not incorporate many less similar sequences.

Now, let’s look at the resulting structure predictions:

UL55 prediction

UL55 prediction

UL56 prediction

UL56 prediction

In both cases, the predictions are colored by the pLDDT which is a confidence measure of how well AlphaFold "thinks" its prediction is.

Here is an excerpt from the EBI FAQ:
AlphaFold produces a per-residue estimate of its confidence on a scale from 0 – 100 . This confidence measure is called pLDDT and corresponds to the model’s predicted score on the lDDT-Cα metric. It is stored in the B-factor fields of the mmCIF and PDB files available for download (although unlike a B-factor, higher pLDDT is better). pLDDT is also used to colour-code the residues of the model in the 3D structure viewer. The following rules of thumb provide guidance on the expected reliability of a given region:

Note that the PDB and mmCIF files contain coordinates for all regions, regardless of their pLDDT score. It is up to the user to interpret the model judiciously, in accordance with the guidance above. The pLDDT per position is also given as a plot for the five models made in every run and gives a simpler overview:

Predicted IDDT per position

UL55 pLDDT plot, note higher score at the C-terminus for models 1-3

Predicted IDDT per position

UL55 pLDDT plot

Note the high overall scores for UL55 and low ones for UL56. The overall low scores for the UL56 prediction should make us cautious.


Finally, the Predicted Alignment Error (PAE) gives an estimate of the relative position of domains. Again an excerpt from the EBI FAQ:

Independent of the 3D structure, AlphaFold produces an output called “Predicted Aligned Error”. This is shown at the bottom of structure pages as an interactive 2D plot.

Let's look at the PAE Plots for both UL55 and UL56:

UL55 PAE scores for 5 models

UL55 PAE scores for 5 models. Blue is better

UL56 PAE scores for 5 models

UL56 PAE scores for 5 models

You can immediately see, that the position of most amino acids to each other is unclear in all predictions. The relative position of the predicted alpha-helices should be therefore taken with more than a grain of salt.