Understanding the structure of proteins can help understand their function; however, existing computational methods fail to predict 3D structures of proteins with atomic accuracy. Thankfully, a recent paper on Nature introduces a redesigned version of a neural network-based model AlphaFold. It can predict protein structures with accuracy competitive with the experiment in most cases.
The model relies on a combination of two previously separately used strategies: simulating physical interactions and bioinformatic analysis of protein evolutionary structure. Such an approach enables the network to learn from Protein Data Bank data with minimal handcrafted impositions. It can handle missing physical context and produce accurate models even in challenging cases. The AlphaFold has already proved its value in the experiments of molecular replacement and interpreting cryogenic electron microscopy maps.
Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been determined, but this represents a small fraction of the billions of known protein sequences. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the 3-D structure that a protein will adopt based solely on its amino acid sequence, the structure prediction component of the ‘protein folding problem’, has been an important open research problem for more than 50 years. Despite recent progress, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even where no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experiment in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.
Research paper: Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021). Link to the article: https://www.nature.com/articles/s41586-021-03819-2