by
A long-standing problem in biology has been the question of what makes proteins fold i.e what causes linear amino acid sequences to fold into complex three-dimensional structures, structures which are vital to the function of the protein within a living system. Christian B. Anfinsen proposed that the amino acid sequence itself determined the three-dimensional structure of the protein in the 1950's (2). This theory has been found to apply to a greater or lesser extent to most small globular proteins. Larger proteins have been found to need assistance from other proteins such as Chaperonins to carry out the folding process. Nevertheless, "The Protein Folding Problem" has as one of it's central features - the primary sequence of a protein. It would be very useful to be able to predict the structure of a protein from it's primary sequence for a number of reasons both academic and industrial,
Polarity of Proteins and hydrophobic effects between the protein
and surrounding solvent are the main factros involved in driving
the protein folding process, i.e. in acqueous solutions, polar
amino acids tend to be hydrophilic, attracting polar water molecules
while nonpolar amino acids (most of which contain hydrocarbon
side-chains) tend to be hydrophonic. The hydrophonic parts of
the protein mix poorly with water and are more inclined to associate
with each other (2). This and the peptide bonds between consecutive
amino acids in a sequence has an influence over the available
conformations for a protein. Thus, there appear to be certain
pathways along which a protein will fold. This folding process
may involve the formation of intermediates as described by Oleg
Ptitsyn who speaks of a compact intermediate that is large than
the native form of the protein and has an intact secondary structure
- this has been described as the molten globule. (1,2)
Fraunfelder and Woynes (3) state that the final stages of folding
will depend on the specific sequence of amino acids, whereas earlier
folding stages should be mostly insensitive to details of sequence.
Another important aspect of protein structure is that native structures
appear to be quite robust to mutations in their primary sequence
with practically any residue in a sequence being replaceable without
causing any change in the proteins structure or orientation. The
hydrophobic core seems to be the most important feature of the
protein in relation to it's normal folded state. Obviously, the
enzymatic activity is not subject to the same mutational freedom
with even single residue changes threatening total disruption
of activity (2).
Thermodynamics of Protein conformational changes
Protein stability depends in the free energy change between the folded and unfolded states which is expressed by the following,
where R represents the Avogadro number, K, the equilibrium
constant, G, the free energy change between folded and
unfolded, H, the enthalpy change and S, the entropy
change from folded to unfolded. The enthalpy change, H,
corresponds to the binding energy (dispersion forces, electrostatic
interactions, van der Waals potentials and hydrogen bonding) while
hydrophobic interactions are described by the entropy term, ,
S. Proteins become more stable with increasing negative
values of , G i.e. as the free energy of the unfolded protein
(GU) increases relative to the free energy of
the folded or native protein (GU). In other
words, as the binding energy increases or the entropy difference
between the two states decreases, the folded protein becomes more
stable (13). The folded conformation of a domain is apparently
in a relatively narrow free energy minimum, and substantial perturbations
of that folded conformation require a significant increase in
free energy. The large heat capacity change upon protein unfolding
causes there to be a temperature at which stability of the folded
state is at a maximum. Measured by free energy, the maximum occurs
when S=0, while that measured by the equilibirum constant
occurs when H=0. These maximum stabilities can occur at
quite different temperatures, but both are used in different situations.
Regardless of which one is used, however, the stability of the
folded state decreases at both higher and lower temperatures.
While factors such as binding interactions do obviously play a
part in stabilising the protein, they cannot account for a very
significant portion of stabilisation effects since similar phenomena
occur in the unfolded state (although, the interaction between
folded protein and solvent would be expected to be stronger than
the interaction between the unfolded protein coil and the solvent),
the hydrophobic effect is probably the major stabilising effect
(1).
The thermodynamics of protein stability is modelled quite well
by the Energy landscape theory. While we can speak of discrete
ground and excited states in simple systems such as atoms and
nuclear particles, the description of complex systems like proteins
requires more than such simplistic models, the ground state of
the folded protein is very degenerate and as such, we use the
energy landscape to describe it more adequately, where
the energy of a protein is a function of the topological arrangement
of the atoms. We deal with a spatial surface with a very large
number of different co-ordinates and energy values separated by
mountains and ridges. Each value in this surface describes the
protein in a specific conformation, and there is an energy landscape
for each state of the protein (e.g neutral, charged, folded, intermediate
or unfolded) (3).
Fig.1: Folding-energy landscape for a Protein molecule, depicted schematically in one-dimensional cross-section. The insert depicts a blow-up of the main diagram with multiple substates (3)
The thermodynamic behaviour of proteins as determined in various temperature-jump experiments is best described by stretched exponentials as opposed to Arrhenius's law, where the rate coefficient decreases with increasing speed as the temperature is reduced as follows,
This behaviour is corresponds to that of what are described as
glasses or spin-glasses which undergo a transition in which transition
temperature depends on the characteristic observing time. The
random energy model put forward by Bernard Derrida correleates
well with the rough energy landscape diagram for proteins. Wolynes
and Bryngelson explained this by proposing that the random-energy
model described the misfolding protein states on the energy landscape,
with the misfolding minima acting as "traps" that slow
down the protein molecules folding process, these traps become
successively more difficult to escape as the temperature is lowered.
This suggested that proteins were not random heteropolymers, but
the products of biological evolution which have a tendency not
to get trapped in deep local minima. Due to the fact that there
are a limited number of amino acids from which a protein can be
constructed it is, however, inevitable that some degree of frustration
and landscape roughness will occur, but the landscape is what
can be described as minimally frustrated (3).
The robustness of protein native structure to conformational change
is a consequence of the funneled nature of the energy landscape
of a minimally frustrated protein. The geometry of this landscape
cannot be significantly changed by the modification of a few isolated
residues. Random heteroplymers on the other hand, have an energy
landscape consisting of multiple funnels with each on leading
to a different structure, making them more inclined to conformational
change as a result of sequence modification.
In terms of the hydrophobic interactions and their affect on stability,
the heat capacity of nonpolar compounds dissolved in water is
found to be directly proportional to the surface area of number
of solvated water molecules in the first solvation shell, Privalov
suggested that the hydrophobic contribution in stabilising the
native (folded) protein could be evaluated from the change in
the heat capacity, Cp combined with the temperature
parameters characterising the dissolution of liquid hydrocarbons.
It has also been determined that the ratio of the entropy change,
S, to the heat capacity change, Cp, for
the dissolution of a variety of hydrophobic compounds is a constant.
Thus, heat capacity change appears to define the hydrophobic effect
and this in turn is related to the exposure of non-polar groups
to the solvent, water. The stability of a structure will rise
to a maximum and then decrease with increasing temperature due
to the dominance of non-hydrophobic entropy effects. On the other
hand, stability also decreases at lower temperatures, this cold
denaturation is brought about by increasing hydrophobic solvation
at lower temperatures (4).
Bryngelson and Wolynes have obtained a phase diagram for folding transitions as a function of a set of theoretical parameters. The phase diagram consist of a three distinct regions,
Only in the folded region can a protein attain it's native structure,
while in a glass transition, the folding of the system depends
on it's history i.e. the system has many deep local minima seperated
by energy barriers which the thermal motions (the vibrational
energy) of the molecule cannot overcome. Plots based on simulations
exhibit a sharp S-shape, indicating that model polypeptides at
least undergo rapid conformational transitions at specific temperatures.
This speed of transition makes it difficult to assess the folding
characteristics of the models but it does seem to indicate an
all-or-none transition.
Protein folding can be compared to crystallisation in that a protein
freezes to a unique stable structure while ordinary polymers typically
freeze to form amorphous globules i.e. polypeptides with random
sequences will generally not fold to unique structures (9). Folded
proteins demonstrate varying degress of flexibility, which is
of direct relevance to protein folding, in that it reflects the
free energy constraints on unfolding and refolding. Flexibility
is greatest at the protein surface, where some sidechains and
a few loops have alternative conformations or no particular conformation
that is energetically preserved. Small, globular proteins (less
than 300 residues) exhibit the greatest plasticity of conformation
although it is noted that no protein is known to adopt alternative
fully folded conformations. This flexibility ties in with the
fact that natural proteins do not appear to have been selected
for maximum stability, for a synthetic protein designed empirically
is much more stable,
= -94 kJ/mol. Natural
proteins seem to either require some degree of flexibility for
their function or to allow them to fold into their native conformation
more rapidly. Both of these characteristics would be hindered
by the maximum stable conformation (1).
All literature is in agreement that Hydrophobic interaction is
the major contributor to the stability of the folded or native
state of the protein although other interactions such as Hydrogen
bonds, Van der Waals forces and Electrostatic interactions are
also believed to play a role. One important feature of the folded
state is that it is only marginally more stable than the unfolded
state due to various compensating factros that stabilise the folded
state (basically, any factor involved in stabilising the folded
state will also play a role in stabilising the unfolded state,
albeit a more minor one due to simple entropic factors) and the
unfolded proteins large favourable conformational entropy (1).
The hydrophobic interaction between exposed non-polar amino acid
residues on the surfaces of the protein molecule is, in general,
attractive, short-range, and orientation dependent (7).
There are various concepts that attempt to explain how exactly
a protein undergoes folding and organisation from the unfolded
or random coil state to the native folded state but the most promising
is that the secondary structure forms before most proteins are
able to compact extensively(2). Native protein folding in acqueous
solution of physiological temperature do not get trapped in deep
local minima. The folding appears to proceed from a restricted
conformation ensemble by condensation and secondary strcuture
growth through an even smaller ensemble of "molten globules"
to a thermally jittered final tightly packed "single"
structure. Molecules of the same protein can follow different
pathways to the same end but the choice of pathways is limited
by the thermodynamics of the process (6). The thermodynamic guiding
forces of protein folding will be most active in the early stages
of folding because that's when the density of states is quite
large while in the last stages of folding, when entropy has been
reduced, glass transition could well intervene. These transitions
have been observed to some degree in a number of experiments which
have noted very large activation energies characteristic of glassy
systems appearing in the last stages of protein folding (3).
The Compact Intermediate or "Molten Globule"
Experimental observations of unfolded states induced by different
conditions describe structures with different physical properties
which are indistinguishable thermodynamically, they suggest a
collapsed molecule with native-like secondary structure and a
liquid-like interior - the so called "Molten Globule"
or compact intermediate. The compact intermediate state appears
to be the preferred conformational state of the unfolded protein
under refolding conditions where it is usually only transient.
there may be a continuum of unfolded conformations, with the compact
intermediate state at one extreme and the fully folded native
protein at the other (1).
Oleg Ptitsyn described this structure as an intermediate that
was larger than the native form of the protein and has its secondary
structure intact. It is believed that any polypeptide chain of
near-native composition and length (80 to 300) will exist in this
loose globular state, or as an ensemble of such states when placed
in water (2, 6). These ensembles will have the majority of the
hydrophobic residues on the inside and the hydrophilics on the
outside and will contain numerous secondary structure seeds composed
of short helices and beta hairpins which are continuously being
regenerated in a process whereby the structures develop regions
of compatibility and hydrogen bonding. Peptide chains will be
fold from these limited ensembles containing numerous short transient
hydrogen-bonded substructures and proceed down only those pathways
or funnels that are highly insensitive to sequence differences
within the neighbourhood of the native sequence or sequences (6).
The ideal unfolded protein is the random coil, in which the rotation
angle about each bond of the backbone and side-chains is independent
of that of bonds distant in the sequence, and where all conformations
have comparable free energies, except when atoms of the polypeptide
chain come into too close proximity. Steric repulsions are significant
between atoms close in the covalent structure, and place limitations
on the local flexibility. Unfolded proteins in strong denaturants
such as 6M-GdmCl or 8M-Urea, and disordered polypeptide copolymers,
have been demonstrated to have the average hydrodynamic properties
expected of random coil polypeptides. However, other experimental
evidence suggests that unfolded proteins are not true random coils
under other conditions such as pH or temperature extremes in the
abscence of denaturants. Polypeptides will tend to be less disordered
and and more compact structures in situations where interactions
between other regions of the polypeptide are more energetically
favoured than those with the solvent i.e they tend not to behave
like a random coil. Alternatively, in situations where the interactions
between solvent and polypeptide are especially favourable, stronger
random coil behaviour becomes apparent (1).
Protein structural equilibrium is usually described by the following expression,
Where N represents the Native or folded state and U represents the unfolded state, although it is suggested that the Compact Intermediate state could represent a subset of U which is continuously alternating between various energetically unfavourable states (1).
Multi-domain proteins usually unfold step-wise, with the domains
unfolding individually, either independently or with varying degrees
of interactions between them. Multi-subunit proteins usually dissociate
first, then the subunits unfold, unless domains are on the periphery
of the aggregate where they can unfold independently (1). The
enthalpies (H) and entropies (S) or unfolding are
very temperature dependent because the heat capacity of the unfolded
state is significantly greater than that of the folded state.
One explanation for this dependence is that the heat capacity
difference results mainly from the temperature dependent ordering
of water molecules around the non-polar portions of the protein
molecules, more of which are solvent accessible in the unfolded
state, although other factors may contribute (4).
High concentrations of salt in solvent induce the precipitation
of protein. This is believed to be caused by aggregation of the
protein due to hydrophobic interactions with the level of aggregation
being proportional to the number of exposed associating sites
on the protein, this is further supported by the fact that decreases
in temperature can precipitate proteins in solution with little
added electrolyte. Precipitation may also be caused by osmotic
effects (7).
Some recent research has been conducted on the thermodynamics
of unfolding of azurin, a small blue copper protein that acts
as an electron transfer agent in the redox systems of certain
bacteria. The thermodynamic stability of this enzymes tertiary
structure has been the focus of much research in the past few
years. It is highly resistant to thermal unfolding with irreversible
unfolding only occuring at temperatures in excess of 70°C.
This unusually thermal resistance has been ascribed to a number
of different features of the protein, including the prescence
of disulphide bridges, intramolecular hydrogen bonds, hydrophobic
efffects and stabilisation by Cu++ binding. A model
of the denaturation or unfolding path was developed which involved
two effects: a reversible endothermic process which involves the
destruction of the three-dimensional structure of the protein
and an irreversible and kinetically controlled exothermic process
which involves the aggregation of the polypeptide chain network.
Differential scanning calorimetry was used to analyse the denaturation
process in detail and didn't indicate any endothermic peak suggesting
that the thermal unfolding of azurin is, overall, an irreversible
process i.e. the second stage predominates with the first stage
occuring as a time independent, all or none process (8).
The thermal denaturation of azurin was also examined with Optical Density scanning equipment which gave a series of curves suggesting a scheme like the following,
where N represents the native protein, which maintains
it's structure unchanged up to 67°C. In state N1,
obtained after heating the solution up to 76°C, the copper
ion in the protein is bound with it's ligands, but on enfeeblement
of the copper-sulphur bond occurs as evidenced by a slow decrease
of optical absorbance at 650nm. The transition from N to
N1 is completely reversible. Beyond 76°C,
an irreversible process which involves the change of co-ordination
of copper occurs. This second process has yet to be explained
(8).
Recent Developments and Research into Protein Unfolding
Charles Brooks and his team at the Computational Biophysics research team at the Brooks Insitute have in the past year conducted work using molecular dynamics simulations to calculate protein folding thermodynamics and explore the dominant "flow" from folded to unfolded states from a first principles atomic level description of the protein and solvent environment. The thermodynamic surfaces, free energy and energies, projected onto the radius of gyration, were computed for a 48 residue protein. The calculated free energy surface they obtained suggests the model protein is stable by ~3 kcal/mol and that a
thermodynamic folding intermediate may exist. They have also examined the flow in terms
of the idea of folding funnels. Their analysis gives a consistent
thermodynamic picture of folding which first involves the formation
of the N-terminal helix-turn-helix motif, followed by the "docking"
of the C-terminal helix onto this substructure (15).