Delving into DeepMind’s protein folding model

AlphaFold2 was triumphant in a biennial contest, the Critical Assessment of Protein Structure Prediction (CASP). It beat more than 100 other computational protein prediction teams in two categories – regular targets and interdomain predictions analysis. DeepMind’s software tested in illustrating 3D protein structures was already known by biologists, to some degree, without having been made public. The results of the contest were unveiled in November 2020.

DeepMind has published peer-review papers in two academic journals, Nature and Proteins: Structure, Function and Bioinformatics, arguing its algorithms can help figure out how amino acids fold into 3D protein structures.

Its model started with massive genomic and structural data sets to establish distances between individual pairs of amino acids, according to Nature. More development then added information on physical and geometric boundaries that influence protein folding and its target was expanded to establishing the final structures of target protein sequences.

DeepMind is a Google subsidiary. It paid more than $500m to acquire the research house in 2014 according to the Information, after social media network Facebook failed to do so the previous year.

And the Mountain View internet giant is now being rewarded with advances in neuroscience-driven deep learning, although it has had to dig deep, having most recently put up £1.1bn ($1.5bn) to lessen DeepMind’s debt load in December 2020, according to the Financial Times. Its revenues, however, mainly come from making Google data centres and virtual assistant technologies more efficient.

Given, however,the strategic value of reduced drug development costs – pegged by Tufts Center for the Study of Drug Development at above $2.5bn per approved medication – Google is unlikely to have too many complaints.

Public data on some 170,000 known protein structures have been fed into AlphaFold’s underlying neural network, in addition to large datasets containing protein sequences with unknown structures. The training was carried out using 128 of Google’s tensor processing units, which it says is equivalent to between 100 or 200 graphical processing units.

The platform was also victorious in the last CASP held in 2018, but on this occasion yielded some predictions thought to be on a par with existing technologies for 3D protein modelling, mainly expensive and laboratory-intensive functions such as X-ray crystallography and cryo-electron microscopy.

This time, AlphaFold scored a median score of 92.4 across the protein structure problems in the CASP test – 90 is the benchmark considered even with existing methods – and was 25 points ahead of its nearest challengers. Almost two-thirds of its predictions passed the test at atomic-level resolution, according to SingularityHub.

The score dipped slightly for the most difficult CASP category – free-modelling – in which some of the protein folds are less recognisable, lacking a clear template based on current methods, and was reportedly less capable at discerning protein complexes containing multiple 3D structures.

John Moult, a professor specialised in computational biology at University of Maryland who co-founded the CASP initiative in 1994, said the 3D protein prediction problem had now been resolved. Janet Thornton, a scientist at the European Bioinformatics Institute, told the MIT Technology Review that AlphaFold was likely to “open up a new area of research”.

As is often the case, there are caveats to consider. AlphaFold still failed to reach the benchmark in a third of predictions, which is a margin of error unacceptable in biomedical research terms. A critical blog post on Reciprocal Space said a “dose of realism” must be used on its current technology.

But its forecasts may well combine with existing protein folding techniques to advance research.

Proteins might be thought of as the most stubborn cipher to cracking human diseases. Made up of a combination from 20 amino acids, for proteins in the human body, their structures sprawl into strands and helixes, the sum of which determines what role they will play inside cells.

Some protein folds help healthy biological cells, others sustain viruses or cancers, all leaving scientists with only vague biophysical principles to go on.

Using AI to visualise their make-up in 3D could give researchers unprecedented access to previous intractable disease targets, such as helping to design new “small molecule” therapeutics, typically engineered with the purpose of docking to specific biological protein.

That AI might help provide the answer has been clear for some time. Given enormous number of potential amino acid pathways it is exactly the kind of puzzle that deep learning could be rather good at.

And AlphaFold is by no means the only AI-driven protein prediction team making progress.

Singularity Hub has reported other algorithms were faster, producing results in a few seconds rather than days, although there was a trade-off with accuracy. Nevertheless, plenty of room for a challenge remains in future CASP competitions.