AlphaFold: What the heck is Protein Folding and Why should we care?

Literally, that’s the question I asked myself when I heard about AlphaFold. 😒

Last year, Google DeepMind released an AI model named AlphaFold that can predict the structure of a protein given the protein sequence. And this year (July 22, 2021) they released a database of the structures of all human proteins. It generated a LOT of hype, journalists calling it “It will change everything”, “A scientific breakthrough”. I had no idea about what protein folding is before I started digging into it last night, and when I did, I was quite blown away. 😳👍👏

Fig. Protein structure vs AlphaFold prediction (src. DeepMind)

Basically, protein folding is a very important problem in biology. Every living thing (and viruses) is made out of proteins (yes, the same thing your gym instructor told you to eat more!). Proteins are long chains of amino acids (R-(NH2)COOH) that are merged together with a peptide bond (-CO-NH-). This peptide bond appears to be very strong, and thus good at making long molecules necessary to work as the building blocks of life.

Fig. Peptide bond (src. wiki)

Now, when these protein molecule chains are too long, they form a wiggly waggly 3D shape that is hard to predict. So, protein folding is, trying to predict the protein’s 3D shape from its molecule chain.

Fig. Random protein (src. wiki)

What a protein chain is made of, is relatively easy to find out. But how a protein behaves largely depends on the 3D shape of the protein. And to understand a protein, knowing its shape is important. If we know the shape of a protein, we can tell if that is going to cause a disease, if it can cure a disease; if it is the protein of a virus, fungi, or harmful bacteria, we can find a way to kill it. (Pssst. Don’t tell the coronavirus that we made a vaccine that attacks the 3D protein structure of its pretty spikes 🦠.) If we know a protein structure, we can test how it interacts with a particular medicine in a computer simulation, without ever having to experiment in a human or an animal! This will drastically improve drug discovery, and also our understanding of many many diseases!

So, you understand, understanding protein shape is important. But it is hard. Really hard. How hard you say? Scientists have to figure out the 3D shape of proteins with 3D X-ray analysis. This method takes up to a year and costs up to $120,000 for a single protein!

Henceforth, 50 years ago, a bunch of smart biologists formed a group named CASP and they have been organizing a competition for predicting the 3D structure of proteins from their sequences. NO competitor has done it good enough until 2020, until AlphaFold. AlphaFold’s prediction is so reliable, that this problem is now considered “practically solved”.

Fig. Prediction performance (src. DeepMind)

Remember what I said about the time and cost of finding a protein structure?

  • Can you guess how long does it take now to find a protein structure?

LESS THAN AN HOUR.

  • How much does it cost now?

ABSOLUTELY NOTHING.

  • And the best part, you can run it *drum roll please*

IN. YOUR. LAPTOP.

Don’t believe it? I didn’t either. Then I generated this protein structure last night by connecting my 8GB RAM laptop to free Google Colab GPU.

Input amino acid sequence: MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH

Output protein structure by AlphaFold:

Fig. Custom protein prediction on a laptop (src. me)

Here is the code.

More about protein primary sequences can be found here.

Well. They could’ve stopped there. They didn’t.

Using this AI, they found the protein shapes of almost every (~20,000) protein found in the human body, along with about 350,000 other protein structures from 20 other biologically significant organisms. In traditional method, this would have taken 43 BILLION dollars, and who knows how many years!

And they released all of it.

For free.

Now they are working on releasing the protein structures of all 100 million proteins known to humankind.

And you can access all of that from your laptop.

When the next pandemic hits, this will drastically accelerate finding the cure. It will reduce the painstaking research of experimental biology to running a simulation on your computer. It will democratize biological research from large pharma to people like you and me. Down the road, it will save many lives, and change many more. And it also makes me wonder, what else we thought was impossible can be done with AI?

“What else we thought was impossible can be done with AI?”

Understanding proteins is understanding the building blocks of life — by extension, it is understanding life itself. How the first life began on earth is still a mystery. Maybe one day this will help us figure that out and create artificial life for the first time! If this doesn’t excite someone to live in the 21st century, I have 3 words for them:

“Go fold yourself.”

Researcher in NLP and Machine Learning | masumhasan.net