An interview with Sébastien Lemal, Data Scientist at IPA Therapeutics [1] and user of AlphaFold
Predicting the tertiary structure of proteins
Protein structure prediction [1] means the deduction of the three-dimensional structure (tertiary structure) of a protein from its amino sequence (primary structure). By understanding the structure of a protein, you can anticipate what the protein can do (e.g., bind to a pathogen to fight a virus). Therefore, protein structure prediction is pivotal in drug design. To predict the tertiary structure several methods can be used: e.g., ab initio methods based on laws of physics or chemistry, or methods using templates of the secondary structure. However, these methods require human interaction and are very time-consuming.
AlphaFold, from simple sequence to structural modals
The computer program AlphaFold performs predictions of protein structure using artificial intelligence [2], creating an easy end-to-end solution from a simple sequence to structural models with few input parameters.
Sébastien Lemal: “At IPA we deliver solutions for data analysis and in silico drug development and to that end, we want to have the most user-friendly data pipelines We are actively studying how AlphaFold can be integrated into our already existing automated data workflows.”
To get by with a little help from… the community
AlphaFold is designed as a deep learning system. In 2018 AlphaFold I got outstanding results in the overall rankings of the CASP [4] (a competition which assesses the performance of protein structure prediction algorithms). Two years later, a team using AlphaFold II repeated that placement.
Sébastien Lemal: “When DeepMind came out as the winner of the CASP competition, AlphaFold was a hype. Everybody thought DeepMind had solved the ‘protein folding problem’ and researchers were wondering if molecular modeling would become obsolete. Once the code was released as open-source, more people started experimenting with it. And it turned out, there were still some issues. For example, AlphaFold is very good to find the overall folding of these proteins in a generic sense, but compared to experimental models, it still lacks atomic resolution in some cases. As a user, we need to be aware of these limitations. Otherwise, the predictions in which we trust, fail when used for industrial purposes so we need to understand exactly what happened.”
DeepMind realized they could benefit from the experiences of their users and encouraged people to experiment with the code and gather knowledge about AlphaFold’s limitations. They organized round tables for people from academia and industry to discuss their experiences, the scope and limitations of the tool.
Among the researchers using the VSC infrastructure, there was also a lot of interest to work with AlphaFold.
So, over a year ago, the VSC AlphaFold community was founded. VSC organized workshops and training sessions and was active to set up and optimize the AlphaFold deployment on the VSC supercomputing infrastructure.
For me, the greatest value of the VSC community lies in being an active community, keeping us informed about AlphaFold news. The tool keeps getting updates based on input of the community.
Sébastian Lemal: “The sessions organized by VSC helped me understand more about AlphaFold, its usage and deployment in HPC settings, which helped to deploy it in a Cloud setting, as we do at IPA. My experience with the community is very positive. Setting up AlphaFold in HPC is now fairly easy and it was quite interesting to see how the VSC achieved that.”
Sébastien Lemal: “For me, the greatest value of the VSC community lies in being an active community, keeping us informed about AlphaFold news. The tool keeps getting updates based on input of the community. Moreover, several research groups are actively building new software/data revolving around AlphaFold, further closing the gap between experimental and computer-generated model accuracies.
Being part of the community helps to keep track of all the progress and change. Although I am quite new in the field, if something comes up from my side I would not hesitate to share this with the community. This is the best way to progress. There are always new things to learn and, it is always valuable for the community to get new experiences.”
If you are interested in Google's DeepMind AlphaFold, visit our VSC AlphaFold Community page here, and become part of our growing AlphaFold Community!
[1] Source: Wikipedia Protein structure prediction: https://en.wikipedia.org/wiki/Protein_structure_prediction [2] Source: Wikipedia AlphaFold: https://en.wikipedia.org/wiki/AlphaFold [3] IPA: IPA has become a Hub for Biotherapeutic IntelligenceTM connecting and integrating data science, in silico drug development and bioscience. The antibody discovery platform encompasses both in silico workflows via the LENSai platform and a sophisticated and streamlined wet-lab infrastructure to accelerate therapeutic antibody discovery and development, leveraging state-of-the art equipment and t
he know-how of a highly experienced scientific team. IPA has an internal team of experts that specifically focus on strategies to develop next generation antibodies from target validation through clinical readiness. In addition, IPA is also active in Protein Manufacturing and offering mammalian protein expression and purification services. [4] CASP: Critical Assessment of protein Structure Prediction