AlphaFold reveals the structure of the protein universe

633 pointsby MindGodsalmost 3 years ago

25 comments

COGloryalmost 3 years ago

Before my comment gets dismissed, I will disclaim I am a professional structural biologist that works in this field every day.These threads are always the same: lots of comments about protein folding, how amazing DeepMind is, how AlphaFold is a success story, how it has flipped an entire field on it's head, etc. The language from Google is so deceptive about what they've actually done, I think it's actually intentionally disingenuous.At the end of the day, AlphaFold is amazing homology modeling. I love it, I think it's an awesome application of machine learning, and I use it frequently. But it's doing the same thing we've been doing for 2 decades: pattern matching sequences of proteins with unknown structure to sequences of proteins with known structure, and about 2x as well as we used to be able to.That's extremely useful, but it's not knowledge of protein folding. It can't predict a fold de novo, it can't predict folds that haven't been seen (EDIT: this is maybe not strictly true, depending on how you slice it), it fails in a number of edge cases (remember, in biology, edge cases are everything) and again, I can't stress this enough, we have no new information on how proteins fold. We know all the information (most of at least) for a proteins final fold is in the sequence. But we don't know much about the in-between.I like AlphaFold, it's convenient and I use it (although for anything serious or anything interacting with anything else, I still need a real structure), but I feel as though it has been intentionally and deceptively oversold. There are 3-4 other deep learning projects I think have had a much greater impact on my field.EDIT: See below: <a href="https://news.ycombinator.com/item?id=32265662" rel="nofollow">https://news.ycombinator.com/item?id=32265662</a> for information on predicting new folds.

评论 #32265670 未加载

评论 #32265662 未加载

评论 #32265379 未加载

评论 #32269647 未加载

评论 #32265621 未加载

评论 #32320107 未加载

评论 #32270536 未加载

评论 #32271483 未加载

评论 #32268011 未加载

评论 #32268187 未加载

评论 #32268413 未加载

评论 #32268472 未加载

评论 #32265784 未加载

评论 #32269369 未加载

评论 #32265399 未加载

评论 #32266293 未加载

评论 #32265481 未加载

评论 #32267463 未加载

评论 #32265579 未加载

评论 #32265625 未加载

crispyambulancealmost 3 years ago

I got a 5th grader question about how proteins are used/represented graphically that I've never been able to find a satisfying answer for.Basically, you see these 3D representations of specific proteins as a crumple of ribbons-- literally like someone ran multi-colored ribbons though scissors to make curls and dumped it on the floor (like a grade school craft project).So... I understand that proteins are huge organic molecules composed of thousands of atoms, right? Their special capabilities arise from their structure/shape. So basically the molecule contorts itself to a low energy state which could be very complex but which enables it to "bind?" to other molecules expressly because of this special shape and do the special things that proteins do-- that form the basis of living things. Hence the efforts, like Alphafold, to compute what these shapes are for any given protein molecule.But what does one "do" with such 3D shapes?They seem intractably complex. Are people just browsing these shapes and seeing patterns in them? What do the "ribbons" signify? Are they just some specific arrangement of C,H,O? Why are some ribbons different colors? Why are there also thread-like things instead of all ribbons?Also, is that what proteins would really look like if you could see at sub-optical wavelength resolutions? Are they really like that? I recall from school the equipartition theorem-- 1/2 KT of kinetic energy for each degree of freedom. These things obviously have many degrees of freedom. So wouldn't they be "thrashing around" like rag doll in a blender at room temperature? It seems strange to me that something like that could be so central to life, but it is.Just trying to get myself a cartoonish mental model of how these shapes are used! Anyone?

评论 #32264983 未加载

评论 #32265384 未加载

评论 #32270332 未加载

评论 #32265157 未加载

评论 #32265190 未加载

评论 #32269130 未加载

评论 #32265471 未加载

评论 #32269987 未加载

评论 #32272783 未加载

评论 #32265358 未加载

评论 #32265160 未加载

jarenmfalmost 3 years ago

This is probably one of the best applications of AI in science in terms of impact so far. I can't think of any other problem with the same potential impact.EDIT: grammar

评论 #32264362 未加载

评论 #32264752 未加载

评论 #32263863 未加载

评论 #32263959 未加载

dalbasalalmost 3 years ago

Can someone put AlphaFold's problem space into perspective for me?Why is protein folding important? Theoretical importance? Can we do something with protein folding knowledge? If so, what?I've been hearing about AlphaFold from the CS side. There they seem to focus on protein folding primarily as an interesting space to apply their CS efforts.

评论 #32263453 未加载

评论 #32263341 未加载

评论 #32263486 未加载

评论 #32264564 未加载

评论 #32263375 未加载

评论 #32263410 未加载

评论 #32263898 未加载

评论 #32264079 未加载

评论 #32264492 未加载

评论 #32263980 未加载

评论 #32264827 未加载

评论 #32263876 未加载

epupsalmost 3 years ago

AlphaFold is a phenomenal tool that demonstrates how AI can already outclass humans for certain tasks. It is a prime example of a problem space where conventional approaches are simply inferior, and that AI is not just a fancy name but can be extraordinarily powerful.

评论 #32263472 未加载

评论 #32268258 未加载

sabujpalmost 3 years ago

Many thanks to Deepmind for releasing predicted structures of all known protein monomers. What I'd like next is for Alphafold (or some other software) to be able to show us multimeric structures based on the single monomer/subunit predictions and protein-protein interactions (i.e. docking). For example the one I helped work on back in my structural biology days was the circadian clock protein KaiC : <a href="https://www.rcsb.org/structure/2GBL" rel="nofollow">https://www.rcsb.org/structure/2GBL</a>, that's the "complete" hexameric structure that shows how each of the subunits pack. The prediction for the single monomer that forms a hexamer is very close to the experimental <a href="https://alphafold.ebi.ac.uk/entry/Q79PF4" rel="nofollow">https://alphafold.ebi.ac.uk/entry/Q79PF4</a> and in fact shows the correct structure of AA residues 500 - 519 which we were never able to validate until 12 years later (<a href="https://www.rcsb.org/structure/5C5E" rel="nofollow">https://www.rcsb.org/structure/5C5E</a>) when we expressed those residues along with another protein called KaiA which we knew binds to the "top" CII terminal (AAs 497-519) of KaiC. If we would have had this data then, it would have allowed us to not only make better predictions about biological function and protein-protein interactions but would have helped better guide future experiments.What we can do with this data now is use methods such as cryo-em to see the "big picture", i.e. multi-subunit protein-protein interactions where we can plug in the Alphafold predicted structure into the cryo-em 3d density map and get predicted angstrom level views of what's happening without necessarily having to resort to slower methods such as NMR or x-ray crystallography to elucidate macromolecular interactions.A small gripe about the alphafold ebi website: it doesn't seem to show the known experimental structure, it just shows "Experimental structures: None available in PDB". For example the link to the alphafold structure above should link to the 2GBL, 1TF7, or any of the other kaic structures from organism PCC7942 at RCSB. This would require merging/mapping data from RCSB with EBI and at least doing some string matching, hopefully they're working on it!

评论 #32270192 未加载

alphabettingalmost 3 years ago

Obtaining this dataset prior to alphafold would have cost on the order of $200 trillion. <a href="https://twitter.com/wintonARK/status/1552653527670857729" rel="nofollow">https://twitter.com/wintonARK/status/1552653527670857729</a>Anyone knowledgeable know if this estimate is accurate? Insane if true

评论 #32264524 未加载

评论 #32264876 未加载

评论 #32264649 未加载

评论 #32317508 未加载

bifftasticalmost 3 years ago

How do they know their structures are correct?

评论 #32263287 未加载

评论 #32263283 未加载

评论 #32263350 未加载

评论 #32263205 未加载

评论 #32263160 未加载

评论 #32263252 未加载

评论 #32263528 未加载

评论 #32263607 未加载

gz5almost 3 years ago

>we’re now releasing predicted structures for nearly all catalogued proteins known to scienceis the result that researchers will now much more quickly 'manually' validate or invalidate the predicted structures for proteins they are working with? i understand it is traditionally a long and complex process, but i imagine it is expedited by having a predicted structure to test as the baseline?

dekhnalmost 3 years ago

Demis and John will probably win either the Chemistry or Physics Nobel Prize in the next couple of years.

评论 #32265313 未加载

codedokodealmost 3 years ago

Today I learned that there are bacteria that have a protein helping to form ice on plants [1] to destroy them and extract nutrients (however I didn't understand how bacteria themselves survive this).Machine learning typically uses existing data to predict new data. Please explain: Does it mean that AlphaFold can only use known types of interactions between atoms and will mispredict the structure of proteins that use not yet known interactions?And why we cannot just simulate protein behaviour and interactions using quantum mechanics?[1] <a href="https://pubs.acs.org/doi/10.1021/acs.jpcb.1c09342" rel="nofollow">https://pubs.acs.org/doi/10.1021/acs.jpcb.1c09342</a>

评论 #32266619 未加载

评论 #32265314 未加载

kache_almost 3 years ago

This is an incredible gift to humanity. A huge positive impact. The team should be proud

carbocationalmost 3 years ago

The press release is a bit difficult to place into historical context. I believe that the first AlphaFold release was mostly human and mouse proteins, and this press release marks the release of structures for additional species.

评论 #32264261 未加载

donut2dalmost 3 years ago

A fun way I've been thinking about all this is what nanotech/nanobots are actually going to look like. Tiny little protein machines doing what they've been doing since the dawn of life. We now have a library of components, and as we start figuring out what they can do, and how to stack them, we can start building truly complex machinery for whatever crazy tasks we can imagine. The impact goes so far beyond drugs and treatments.

swayvilalmost 3 years ago

I had a dream about this a few days ago. About complexly wrinkled/crumpled/convolved things.Like a fresh crepe stuffed into the toe of a boot. Bewilderingly complex.But I have a question. Does such contortion work for 3d "membranes" in a 4d space? It's something I'm chewing on. Hard to casually visualize, obviously.

评论 #32264426 未加载

klemolaalmost 3 years ago

As an aside, the protein structure visualizations in the article are pretty. Is there a good source for more?

评论 #32266643 未加载

评论 #32264518 未加载

roscoebeeziealmost 3 years ago

I haven’t had a chance to look through some of the new predictions, but I know there were some issues with predicting the structure for membrane bound proteins previously. PDB hardly contains any.Does the new set of predictions contain a bunch of membrane bound protiens?

epicquestalmost 3 years ago

Come play biotech with us and let's figure out EVERYTHING and not just protein folding, yay! <a href="https://epicquest.bio" rel="nofollow">https://epicquest.bio</a>

jakoszalmost 3 years ago

Now we can start guessing what futures they are betting on: these, in which open-sourcing the whole thing commoditises critical complements.---<a href="https://www.gwern.net/Complement" rel="nofollow">https://www.gwern.net/Complement</a>

djenendikalmost 3 years ago

The many body problem remains unsolved. So the question is, is this approach useful?

candiddevmikealmost 3 years ago

Is folding@home obsolete now?

评论 #32264059 未加载

评论 #32264316 未加载

评论 #32263680 未加载

评论 #32263673 未加载

cm2187almost 3 years ago

How do you know that the predicted structure will be correct? I presume researchers will need to validate the structure empirically. Do we know how good the model has been at predicting so far?

navesalmost 3 years ago

Just imagine if the tech world puts all programatic advertising development on hold for a year and the collective brain power is channeled to science instead…

inspirerhetoricalmost 3 years ago

Does anyone know what it would cost to download this whole dataset? Google Cloud Datasets only allow 1 TB/month for free to download, I believe

评论 #32267660 未加载

yuan43almost 3 years ago

> Today, I’m incredibly excited to share the next stage of this journey. In partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the AlphaFold DB by over 200x - from nearly 1 million structures to over 200 million structures - with the potential to dramatically increase our understanding of biology.And later:> Today’s update means that most pages on the main protein database UniProt will come with a predicted structure. All 200+ million structures will also be available for bulk download via Google Cloud Public Datasets, making AlphaFold even more accessible to scientists around the world.This is the actual announcement.UniProt is a large database of protein structure and function. The inclusion of the predicted structures alongside the experimental data makes it easier to include the predictions in workflows already set up to work with the other experimental and computed properties.It's not completely clear from the article whether any of the 200+ million predicted structures deposited to UniProt have not be previously released.Protein structure determines function. Before AlphaFold, experimental structure determination was the only option, and that's very costly. AlphaFold's predictions appears to be good enough to jumpstart investigations without an experimental structure determination. That has the potential to accelerate many areas of science and could percolate up to therapeutics.One area that doesn't get much discussion in the press is the difference between solid state structure and solution state structure. It's possible to obtain a solid state structure determination (x-ray) that has nothing to do with actual behavior in solution. Given that AlhpaFold was trained to a large extent on solid state structures, it could be propagating that bias into its predicted structures.This paper talks about that:> In the recent Critical Assessment of Structure Prediction (CASP) competition, AlphaFold2 performed outstandingly. Its worst predictions were for nuclear magnetic resonance (NMR) structures, which has two alternative explanations: either the NMR structures were poor, implying that Alpha-Fold may be more accurate than NMR, or there is a genuine difference between crystal and solution structures. Here, we use the program Accuracy of NMR Structures Using RCI and Rigidity (ANSURR), which measures the accuracy of solution structures, and show that one of the NMR structures was indeed poor. We then compare Alpha-Fold predictions to NMR structures and show that Alpha-Fold tends to be more accurate than NMR ensembles. There are, however, some cases where the NMR ensembles are more accurate. These tend to be dynamic structures, where Alpha-Fold had low confidence. We suggest that Alpha-Fold could be used as the model for NMR-structure refinements and that Alpha-Fold structures validated by ANSURR may require no further refinement.<a href="https://pubmed.ncbi.nlm.nih.gov/35537451/" rel="nofollow">https://pubmed.ncbi.nlm.nih.gov/35537451/</a>

评论 #32265312 未加载