David Baker (U. Washington / HHMI) Part 1: Introduction to Protein Design

128.43k views3539 WordsCopy TextShare

Science Communication Lab

http://www.ibiology.org/ibioseminars/david-baker-part-1.html Lecture Overview: Baker begins his ta...

Video Transcript:

hi I'm David Baker from the University of Washington and today I'm going to give you an introduction to protein design proteins function by folding to Unique native structures and some representative native structures are shown on this slide uh proteins are encoded in genes in our genomes each gene encodes one protein and the proteins fold up to these unique n native structures in order to carry out their biological function native structures of proteins are likely the lowest energy states for the protein sequence so for each protein each amino acid sequence of a protein uh there corresponds

an energy landscape of which I've shown a cartoon here and there are many different possible confirmations a protein can have uh the native state of a protein is the lowest energy State what I've shown uh here there are two research problems I'm going to describe today the first problem is the problem of predicting protein structure in our genomes we have on the order of 30,000 different genes each encodes a unique protein and each organism that exists on Earth has a different genome with a different complement of genes and hence uh proteins so there's a general

problem of predicting what the structures and functions of these proteins are so on the top the top arrow is shows going from an amino acid sequence to a three-dimensional structure um so in this case we have a fixed amino acid sequence and we have to find the lowest energy structure the inverse problem is the protein design problem which I'm going to focus on today in this case we don't start with a naturally occurring amino acid sequence or a naturally occurring structure rather we start with a brand new structure that we'd like to make and we

go backwards to find an amino acid sequence which will fold up to that structure both of these problems the protein structure prediction problem and the protein design problem are very hard problems and I'm going to tell you why in the next few slides the first reason they're hard is that a polypeptide chain can have a very large number of different possible confirmations for each side chain in a for each amino acid in a protein chain um there are many rotatable bonds as shown in this schematic so each side chain each amino acid can have on

the order of uh three different confirmations so if you have 100 residue protein that means you have three confirmations for the first one three for the second one and the number of possible confirmations total you get by multiplying together all of these possibilities so it's 3 * 3 * three um uh up to 100 times so more generally if you have if nres is the number of amino acids in the protein the number of different confirmations is three to that power so three to n res and this is an astronomical number the second reason that

these problems are hard in particular the design problem is hard is there's also an astronomical number of protein sequences so again the first first residue can be any one of the 20 different amino acids the second position can again can also be any one of the 20 amino acids so the number of possible sequences is 20 * 20 * 20 to the nres power which is again a very very large number the third reason that these are hard problems is that uh we need to find the lowest energy structure for a sequence for example in

the protein structure prediction problem it's hard because calculating energies is is difficult to do accurately because they're many proteins have many many atoms and they're surrounded by water molecules uh which also have many atoms each each water molecule only has has three atoms but there are many many water molecules so we need to calculate energies accurately uh for systems that have many thousands of atoms and now what I'm going to do is tell you about how we go about uh solving these problems so to search through the possible set the possible confirmations for a protein

we try and mimic the actual folding process and um here you see uh an uh a movie depicting uh the computer calculation this is using the Rosetta methodology which um my group and others have been developing for the last 15 years or so we try and simulate the actual process of folding so we can sample through um and find the lowest energy structures much more quickly than we could if we were sampling all possible configurations which is essentially impossible um so this calculation that you see here is um takes not much longer than it takes

you to watch it to actually calculate to actually carry out on a computer um the challenge is that um every uh folding calculation like this or nearly everyone will end up in a different final structure so we need to do is many many of these independent calculations um uh to build up a picture of um what that energy landscape looks like and where the lest energy structure is uh the second problem that I mentioned searching through the space of sequences uh we handle in uh as shown on this in this animation um starting with a

protein backbone for which we want to find a very low energy sequence we we carry out a calculation in which at each step we're randomly subu in a different amino acid uh identity and different side chain confirmation for that amino acid acid at a randomly selected position uh we can do these substitutions very rapidly we evaluate the energy and we accept the change if the energy got lower so in this way we can scan through a very the very large number of possible sequences and quite rapidly identify the lowest energy sequence for a structure the

third problem uh the uh necessary necessity to calculate energies accurately um uh we solve in the f following way we use a model in which we try and capture the detailed interactions between atoms as accurately as we can so we have um their terms in the energy function that favor close Atomic packing but the atoms can't be overlapping they penalize the burial of polar atoms that would that would like to interact with solvent they penalize the burial of such atoms away from water they favor the formation of hydrogen bonding interactions between polar atoms we model

the electrostatic interactions the the favorability of positive and negative charges to be close together and we also model the the bending preferences of the polypeptide chain so given um what I've told you the algorithms for searching for the lowest energy structure for a given amino acid sequence that was in the movie where the protein sequence protein structure was moving around um uh and the algorithm for searching for the lowest energy sequence for a fixed structure there are again two problems which um uh we can approach the first problem is the structure prediction problem where again

we are going from genome sequences to um try to starting from those and try and predicting the structures and functions of the proteins that are encoded by those genes the second problem is the design problem where we start with something completely new that we would like to make and um work backwards to form to identify a sequence um which is predicted to fold up to that structure and for the remainder of this talk I'm going to describe some examples of the uh second type of calculation the design calculation uh first I want to give you

an overview of the different types of protein structures found in nature they're um in the top left is a depiction of uh a globular protein where um the uh the secondary structure elements the Alpha heles and the beta sheets come together and to form a roughly spherical protein with hydrophobic residues buried in the interior um and it's the burial of those hydrophobic residues away from solvent which stabilizes the protein on the right is um A protein that consists of long helices packed together um uh to make for example in the case of what's shown a

channel protein um in the lower left is a uh repeat protein in which a very simple module is repeated over and over and over again to make a long filament and then finally on the bottom right uh is a small protein which is held together with disulfide bonds which are uh shown in yellow and nature accomplishes all of the the great diversity of biological functions in our bodies and and all living things through different um by utilizing these different types of proteins in different circumstances where each one is most appropriate so what I'm going to

tell you describe now is um our efforts to design ideal versions of these types of these classes of proteins not any Not A protein that exists in nature but sort of like the platonic ideal of a globular protein or a repeat protein um in contrast to what's uh been um come through Evolution has been the result um of natural selection So Random amino acid substitutions then selection uh the process that and so what the result is the proteins you actually get have a lot of history in them and they may have initially been been functioned

in one way and then they were co-opted for something else so each protein has a lot of idiosyncrasies because of its history what I can now describe to you is taking what we've learned about these classes of proteins and the algorithms I described to make again sort of idealized protein structures which are free of those types of idiosyncrasies um and the way these calc this this works is I've outlined how the calculations how we calculate a sequence which is predicted to fold up to a given structure um but that's just the first step the next

step is uh since we've designed the protein uh we know what its amino acid sequence is because we came up with that amino acid sequence from the amino acid sequence we can work back to the DNA sequence uh that's using the genetic code which was worked out in the 1960s once we know the DNA sequence we can um write down down um uh we can um essentially buy or make very easily in the lab a synthetic piece of DNA that encodes this protein so the protein we've designed on the computer will have never existed in

nature something completely new um and the real real Miracle of this is that it's so easy to manufacture DNA these days that we can for any crazy protein we design on the computer we can very very easily U make a gene that encodes that uh that protein and once we have that Gene we can make the protein in the laboratory um uh by putting the gene into bacteria uh growing up the bacteria we can extract the protein out and then we can determine uh whether that protein folds up to the structure that we designed um

and we can also measure other properties of the protein so what I'm going to tell you about our uh our several design calculations we set out to make a brand new protein that was an idealized version of what exists in nature we uh we did carried out the design calculation we designed a gene encoding the designed protein we put it into bacteria purified the protein and then solve the structure and so I'm going to be showing you the design models and then the crystal structures of those designs that we determined experimentally so the first example

is um of the class of globular proteins which are composed of regular secondary structure element surrounding a hydrophobic core um when after we do the design calculation where we come up with a sequence that's predicted to um adopt the structure and the two structures I'm talking about here are the ones that are shown under the design column on this slide they're again they're idealized so all the heles are perfect heles the strands are perfect strands and the loops are very regular um there's one more step we take advantage of the protein structure prediction calculation I

described so we take those sequences and we send them out to volunteers all around the world who participate in a project called Rosetta at home and these volunteers uh predict what the structure is of of that sequence they search for the lowest energy state of that sequence and in the plots on the left uh you see many many red dots each Red Dot is the result of a different Rosetta at home volunteer on the y- AIS is the energy that's calculated by the Rosetta program that's running on their computer and on the x-axis is how

far away that that low energy structure they found was from the structure we were trying to make the one that's in the design column and you can see first of all how big and complicated the space is by the fact that um many of these lowest energy structures that are found are very far away from the structure that we're targeting so the x-axis is root mean Square deviation in um in the atomic coordinates so uh so these structures on the right of these plots are 10 inrs each each atom is on average 10 inrs away

from where it was supposed to be in the design model um so you can see that different people land in different local Minima on the landscape it's different ones of those bumps or those Wells that I showed in that schematic near the beginning but what you can see is true for both of these sequences is that the lower the energy that's again on the y axis the lower the energy the the more the structure tends toward the um design model and so there's almost a funnel shape to these plots where uh as you go to

lower and lower rmsd going left the energy gets lower and lower so the lowest energy structures found by our Rosetta volunteer uh Rosetta home volunteers who really play a critical role in our research the low energy structures are almost identical to the design model when we see this property which is the one that we are looking for uh we then U manufacture a gene a synthetic piece of DNA that encodes the design we make it in the lab we and then we solve the structure in this case by nuclear magnetic resonance with uh colleagues um

uh in the uh nesg structural genomics consortion and on the right you see U the column marked NMR shows the experimentally determined structure and you can see it's very similar to the the designed models in the second column and then on the far right uh are um superpositions blow up superpositions of the design model and the experimental structure and they show that the side chains in these designs are actually are are actuality where we designed them to be so we've been able to make um such structures Almost Pretty routinely now so we can make brand

new globular protein structures like this um uh quite effectively in fact a new student coming to my laboratory typically has assigned the project of making up a brand new protein structure and proving that the design designing it and then making it characterizing the design in the laboratory now um we can get to larger structures in this way um we have the we can make these ideal platonic ideals of globular proteins and then we can start we can put them together to make larger and more complex structures so this shows an example of taking two of

the two idealized building blocks we'd solve the structure of fusing them together and in the lower panel on the left is the design model and the right is the crystal structure so again this is a completely madeup protein um but uh when we solve it structure experimentally it comes out exactly as we designed it um now the second class of proteins I described are not globular they're not spherical they can be long and elongated and this is actually a protein that's very close to my heart because I designed it myself uh this protein schematic of

it is shown on the top right um this is composed of 80 residue heles and I made it taking advantage of the Francis equations that Francis Crick worked out where by um a backbone structure can be described by a small number of parameters and I can make many many different such structures by sampling through different possibilities for these parameters and I do that and then I design each possibility and choose the lowest energy structures uh when this protein is um manufactured in the lab uh when it was manufactured uh I did some initial tests and

found it was very stable um and then Joe Rogers a graduate student in England uh was asking me for a protein to do uh experiments on so I sent him this protein and he sent back uh this result which is really quite remarkable um uh you have in order to unfold this protein uh you have to add extremely high amounts of a chemical denaturant called guanidine that's on this plot on the left um and uh the unfolding you can see that on the um these lines uh as you add Mor guanidine are pretty flat and

then at very high concentrations over seven molar the protein starts to unfold but only really does this at very high temperature so this is something that's simply not seen for naturally occurring prote proteins these design proteins can be more ideal so much more stable when the crystal structure was solved of this protein it was found to be nearly identical to the design model so we can make this class of proteins also I mentioned uh repeat proteins that was a third class and um we've also been able to make idealized versions of these type of proteins

so um on the on the second column here you see a protein you see a repeated protein that goes on indefinitely um and on the left is a crystal comparison of the design model in red to the crystal structure in Gray you can see they're nearly identical um and on the right you see another example of a of an infinitely extending repeat protein where we've made one sub Subs segment of it in the lab and you again see that the crystal structure is nearly identical to the design model so we're very excited about these as

the basis for new types of new nanomaterials we can make rods straight rods and curved rods and start building things out of them um and the final class of proteins uh those small dulfi bonded proteins are very interesting because they could form the basis of new types of Therapeutics because they're very small and easy to make and um here uh this shows examples of um this is worked by vicr Mulligan a postto in the lab uh where he's designed very short peptides that are predicted to fold up to uh unique structures and there are three

examples in the top row of the slide of designs he made then below that are NMR structures of these peptides when they're actually made uh in the lab and again these peptides um come out very with very very similar structures to the design models so what I hope I've shown you today is uh I've given you um explained uh something about how uh about the protein structure prediction uh problem the protein design problem I've told you how we go about approaching these problems and then I've shown you that we can start to design sort of

idealized versions of the different classes of proteins uh that are found in nature and uh these proteins are likely will be the basis for uh for designing a whole new world of functional proteins uh to solve modern-day problems um and I'll talk about that in in another I bio uh seminar and I want to acknowledge uh the Fantastic people who have um actually done most of this work so noou and R Koga uh developed um these uh rules for making idealized protein structures I showed you uh took you through the design of two of their

structures uh vicr Mulligan I mentioned did the design cyclic peptide work uh TJ brunette uh TJ brunette poang and Fabio uh did the work on the uh repeat proteins and um uh yeah thank you for your attention