Before my comment gets dismissed, I will disclaim I am a professional structural biologist that works in this field every day.<p>These threads are always the same: lots of comments about protein folding, how amazing DeepMind is, how AlphaFold is a success story, how it has flipped an entire field on it's head, etc. The language from Google is so deceptive about what they've actually done, I think it's actually intentionally disingenuous.<p>At the end of the day, AlphaFold is amazing homology modeling. I love it, I think it's an awesome application of machine learning, and I use it frequently. But it's doing the same thing we've been doing for 2 decades: pattern matching sequences of proteins with unknown structure to sequences of proteins with known structure, and about 2x as well as we used to be able to.<p>That's extremely useful, but it's not knowledge of protein folding. It can't predict a fold de novo, it can't predict folds that haven't been seen (EDIT: this is maybe not strictly true, depending on how you slice it), it fails in a number of edge cases (remember, in biology, edge cases are everything) and again, I can't stress this enough, we have no new information on how proteins fold. We know all the information (most of at least) for a proteins final fold is in the sequence. But we don't know much about the in-between.<p>I like AlphaFold, it's convenient and I use it (although for anything serious or anything interacting with anything else, I still need a real structure), but I feel as though it has been intentionally and deceptively oversold. There are 3-4 other deep learning projects I think have had a much greater impact on my field.<p>EDIT:
See below:
<a href="https://news.ycombinator.com/item?id=32265662" rel="nofollow">https://news.ycombinator.com/item?id=32265662</a> for information on predicting new folds.
I got a 5th grader question about how proteins are used/represented graphically that I've never been able to find a satisfying answer for.<p>Basically, you see these 3D representations of specific proteins as a crumple of ribbons-- literally like someone ran multi-colored ribbons though scissors to make curls and dumped it on the floor (like a grade school craft project).<p>So... I understand that proteins are huge organic molecules composed of thousands of atoms, right? Their special capabilities arise from their structure/shape. So basically the molecule contorts itself to a low energy state which could be very complex but which enables it to "bind?" to other molecules expressly because of this special shape and do the special things that proteins do-- that form the basis of living things. Hence the efforts, like Alphafold, to compute what these shapes are for any given protein molecule.<p>But what does one "do" with such 3D shapes?<p>They seem intractably complex. Are people just browsing these shapes and seeing patterns in them? What do the "ribbons" signify? Are they just some specific arrangement of C,H,O? Why are some ribbons different colors? Why are there also thread-like things instead of all ribbons?<p>Also, is that what proteins would really look like if you could see at sub-optical wavelength resolutions? Are they really like that? I recall from school the equipartition theorem-- 1/2 KT of kinetic energy for each degree of freedom. These things obviously have many degrees of freedom. So wouldn't they be "thrashing around" like rag doll in a blender at room temperature? It seems strange to me that something like that could be so central to life, but it is.<p>Just trying to get myself a cartoonish mental model of how these shapes are used! Anyone?
This is probably one of the best applications of AI in science in terms of impact so far. I can't think of any other problem with the same potential impact.<p>EDIT: grammar
Can someone put AlphaFold's problem space into perspective for me?<p>Why is protein folding important? Theoretical importance? Can we do something with protein folding knowledge? If so, what?<p>I've been hearing about AlphaFold from the CS side. There they seem to focus on protein folding primarily as an interesting space to apply their CS efforts.
AlphaFold is a phenomenal tool that demonstrates how AI can already outclass humans for certain tasks. It is a prime example of a problem space where conventional approaches are simply inferior, and that AI is not just a fancy name but can be extraordinarily powerful.
Many thanks to Deepmind for releasing predicted structures of all known protein <i>monomers</i>. What I'd like next is for Alphafold (or some other software) to be able to show us multimeric structures based on the single monomer/subunit predictions and protein-protein interactions (i.e. docking). For example the one I helped work on back in my structural biology days was the circadian clock protein KaiC : <a href="https://www.rcsb.org/structure/2GBL" rel="nofollow">https://www.rcsb.org/structure/2GBL</a>, that's the "complete" hexameric structure that shows how each of the subunits pack. The prediction for the single monomer that forms a hexamer is very close to the experimental <a href="https://alphafold.ebi.ac.uk/entry/Q79PF4" rel="nofollow">https://alphafold.ebi.ac.uk/entry/Q79PF4</a> and in fact shows the correct structure of AA residues 500 - 519 which we were never able to validate until 12 years later (<a href="https://www.rcsb.org/structure/5C5E" rel="nofollow">https://www.rcsb.org/structure/5C5E</a>) when we expressed those residues along with another protein called KaiA which we knew binds to the "top" CII terminal (AAs 497-519) of KaiC. If we would have had this data then, it would have allowed us to not only make better predictions about biological function and protein-protein interactions but would have helped better guide future experiments.<p>What we can do with this data now is use methods such as cryo-em to see the "big picture", i.e. multi-subunit protein-protein interactions where we can plug in the Alphafold predicted structure into the cryo-em 3d density map and get predicted angstrom level views of what's happening without necessarily having to resort to slower methods such as NMR or x-ray crystallography to elucidate macromolecular interactions.<p>A small gripe about the alphafold ebi website: it doesn't seem to show the known experimental structure, it just shows "Experimental structures: None available in PDB". For example the link to the alphafold structure above should link to the 2GBL, 1TF7, or any of the other kaic structures from organism PCC7942 at RCSB. This would require merging/mapping data from RCSB with EBI and at least doing some string matching, hopefully they're working on it!
<i>Obtaining this dataset prior to alphafold would have cost on the order of $200 trillion.</i>
<a href="https://twitter.com/wintonARK/status/1552653527670857729" rel="nofollow">https://twitter.com/wintonARK/status/1552653527670857729</a><p>Anyone knowledgeable know if this estimate is accurate? Insane if true
>we’re now releasing predicted structures for nearly all catalogued proteins known to science<p>is the result that researchers will now much more quickly 'manually' validate or invalidate the predicted structures for proteins they are working with? i understand it is traditionally a long and complex process, but i imagine it is expedited by having a predicted structure to test as the baseline?
Today I learned that there are bacteria that have a protein helping to form ice on plants [1] to destroy them and extract nutrients (however I didn't understand how bacteria themselves survive this).<p>Machine learning typically uses existing data to predict new data. Please explain: Does it mean that AlphaFold can only use known types of interactions between atoms and will mispredict the structure of proteins that use not yet known
interactions?<p>And why we cannot just simulate protein behaviour and interactions using quantum mechanics?<p>[1] <a href="https://pubs.acs.org/doi/10.1021/acs.jpcb.1c09342" rel="nofollow">https://pubs.acs.org/doi/10.1021/acs.jpcb.1c09342</a>
The press release is a bit difficult to place into historical context. I believe that the first AlphaFold release was mostly human and mouse proteins, and this press release marks the release of structures for additional species.
A fun way I've been thinking about all this is what nanotech/nanobots are actually going to look like. Tiny little protein machines doing what they've been doing since the dawn of life. We now have a library of components, and as we start figuring out what they can do, and how to stack them, we can start building truly complex machinery for whatever crazy tasks we can imagine. The impact goes so far beyond drugs and treatments.
I had a dream about this a few days ago. About complexly wrinkled/crumpled/convolved things.<p>Like a fresh crepe stuffed into the toe of a boot. Bewilderingly complex.<p>But I have a question. Does such contortion work for 3d "membranes" in a 4d space? It's something I'm chewing on. Hard to casually visualize, obviously.
I haven’t had a chance to look through some of the new predictions, but I know there were some issues with predicting the structure for membrane bound proteins previously. PDB hardly contains any.<p>Does the new set of predictions contain a bunch of membrane bound protiens?
Come play biotech with us and let's figure out EVERYTHING and not just protein folding, yay! <a href="https://epicquest.bio" rel="nofollow">https://epicquest.bio</a>
Now we can start guessing what futures they are betting on: these, in which open-sourcing the whole thing commoditises critical complements.<p>---<p><a href="https://www.gwern.net/Complement" rel="nofollow">https://www.gwern.net/Complement</a>
How do you know that the predicted structure will be correct? I presume researchers will need to validate the structure empirically. Do we know how good the model has been at predicting so far?
Just imagine if the tech world puts all programatic advertising development on hold for a year and the collective brain power is channeled to science instead…
> Today, I’m incredibly excited to share the next stage of this journey. In partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the AlphaFold DB by over 200x - from nearly 1 million structures to over 200 million structures - with the potential to dramatically increase our understanding of biology.<p>And later:<p>> Today’s update means that most pages on the main protein database UniProt will come with a predicted structure. All 200+ million structures will also be available for bulk download via Google Cloud Public Datasets, making AlphaFold even more accessible to scientists around the world.<p>This is the actual announcement.<p>UniProt is a large database of protein structure and function. The inclusion of the predicted structures alongside the experimental data makes it easier to include the predictions in workflows already set up to work with the other experimental and computed properties.<p>It's not completely clear from the article whether any of the 200+ million predicted structures deposited to UniProt have not be previously released.<p>Protein structure determines function. Before AlphaFold, experimental structure determination was the only option, and that's very costly. AlphaFold's predictions appears to be good enough to jumpstart investigations without an experimental structure determination. That has the potential to accelerate many areas of science and could percolate up to therapeutics.<p>One area that doesn't get much discussion in the press is the difference between solid state structure and solution state structure. It's possible to obtain a solid state structure determination (x-ray) that has nothing to do with actual behavior in solution. Given that AlhpaFold was trained to a large extent on solid state structures, it could be propagating that bias into its predicted structures.<p>This paper talks about that:<p>> In the recent Critical Assessment of Structure Prediction (CASP) competition, AlphaFold2 performed outstandingly. Its worst predictions were for nuclear magnetic resonance (NMR) structures, which has two alternative explanations: either the NMR structures were poor, implying that Alpha-Fold may be more accurate than NMR, or there is a genuine difference between crystal and solution structures. Here, we use the program Accuracy of NMR Structures Using RCI and Rigidity (ANSURR), which measures the accuracy of solution structures, and show that one of the NMR structures was indeed poor. We then compare Alpha-Fold predictions to NMR structures and show that Alpha-Fold tends to be more accurate than NMR ensembles. There are, however, some cases where the NMR ensembles are more accurate. These tend to be dynamic structures, where Alpha-Fold had low confidence. We suggest that Alpha-Fold could be used as the model for NMR-structure refinements and that Alpha-Fold structures validated by ANSURR may require no further refinement.<p><a href="https://pubmed.ncbi.nlm.nih.gov/35537451/" rel="nofollow">https://pubmed.ncbi.nlm.nih.gov/35537451/</a>