Monthly Archives: August 2008

Proteins – Quaternary Structure & Overview

The quaternary structure of a protein involves the association of folded polypeptide chains into a mature, active protein.

  • This can be a single polypeptide chain (monomer), 2 chains (dimer), 3 chains (trimer), 4 chains (tetramer) and so on…
  • The associated chains can be identical or different.

Some quaternary structure require additional polypeptide chains (which were removed during production) in order to achieve a working protein state (eg. Mature insulin). There are also structures which will revert to their original shape once broken, as the order is set in the primary structure of Amino Acids.

With Insulin, a helper amino acid strand is used to ‘hold’ two sequences in place, allowing the formation of disulphide bridges. I’ve tried to illustrate before and after:

S is the signal chain, while B acts as a support structure during disulphide bridge formation between A and C.

– An Overview

  • Primary Structure – The sequence of Amino acids on a chain.
  • Secondary Structure – The 3D relationship between Amino Acids – leading to α helix, β pleated sheet etc.
  • Tertiary Structure – The 3D relationship between parts of the above structure.
  • Quaternary Structure – The number of and relationship between amino acid chains (seperate tertiary structures).

Proteins – Stabilising Forces

There are several different types of forces acting on/within a protein molecule. These include:

  1. Covalent Bonds:
    1. Peptide bonds between Amino Acids (C-N). Can be broken down into individual amino acids by hydrolysis with 6M acid/alkali, or by proteases/proteolytic enzymes.
    2. Disulphide bridges form between cysteine to form cystine. (Cysteine has -SH which forms disulphide bridge -S-S- with another HS-). Bridges are broken down by reduction with β-mercaptoethanol to form cysteines once again.
  2. Non-Covalent Forces/Bonds:
    1. Hydrogen Bonds – these bonds are throughout the protein. The bonds in the middle of the protein structure contribute most to stability as they are furthest away from water (which would disrupt them). These can also be disrupted by heat.
    2. Van Der Waals forces/interactions – short range dipole-dipole (δ+ & δ-) interactions between close atoms. Easily disrupted by heat or denaturing agents.
    3. π-π overlap – π electron clouds delocalised over rings & bonds. Are disrupted by heat.
    4. Electrostatic bonds, Ionic interactions and Salt bridges between residues. All broken by changes in pH or high ionic strength. (Eg, positive residues include Lys, Arg, His while negative residues include Asp, Glu, Tyr & Cys).

– Zwitterions

Zwitterions are amino acids in free solution that are doubly charged. Their net charge will depend on the pH of the solution. Each amino acid has an isoelectric point at which it has no net charge.

Below the isoelectric point (also known as pI), they have a net positive (+ve) charge and above the pl they have a net negative (-ve) charge.

When amino acids become part of a polypeptide/protein, they lose their NH2 and OH groups so only the side chains can carry charges.

Proteins themselves can have isoelectronic points – and this will depend on the number and type of different amino acid residues.

– Hydrophobic Interactions

This is the prime driving force for protein folding (AKA hydrophobic collapse).

Essentially the protein chain will fold in such a way as to minimise the exposure of hydrophobic residues within the chain. This leads to the residues with hydrophilic (polar) side chains being situated on the outside of the molecule.

Proteins – Tertiary Structures

There are two notable tertiary structures – α (ALPHA) helix and BETA pleated sheet.

α Helix

  • Right handed helix much like that of a DNA helix.
  • Each amino acid side chain (R group) is 100 degrees relative to the last side chain, outside of the helix. This means there are 3.6 residues per turn and 5.4 angstroms per turn/level. On the sketch below, each R stands for a different amino acid side chain.

A couple of alterations:

  • Glycine residues will disrupt the α helix as it has no chiral carbon. The lack of a chiral carbon in Glycine makes it very flexible.
  • Proline has a cyclic side chain which restricts the rotation of phi to ~50°. There is also no H atom on the N end of the amino acid so Hydrogen bonding does not occur between residues.

Amphipathic Helices:

  • Helixes can end up with hydrophobic residues on one side and polar (hydrophilic) on the other – essentially giving the helix two faces. The image below illustrates R1, R4, R7 and R8 as hydrophobic, and R2, R4, R5, and R6 as hydrophilic.
  • This means helices can be constructed to generate lipid (hydrophobic) or water (hydrophilic) soluble proteins.

– β Pleated Sheet

There are two types of pleated sheet – Parallel and Anti-Parallel.

  • Parallel sheet has successive polypeptide strands in the same direction.
  • Anti-Parallel sheet has successive polypeptide strands in opposite directions.

These strands are typically 5-10 amino acids long, and the pleated sheet is formed by a continuous series twisted into these strands.

It has been suggested that the anti-parallel configuration is more stable.

Proteins – Primary & Secondary Structures

As mentioned a couple of posts ago:

  • Proteins are polypeptides made from 20 different monomers.
  • On average contain 100-400 monomers.
  • Each monomer has an approximate molecular mass of 110.

– Monomers –> Polymers. The Primary Structure.

  • Amino Acids form peptide bonds (from the carboxylic acid group on one to the amine group on another). This releases water in a condensation reaction. The location of the peptide bond (C-N) is shown below outlined in RED.
  • When reading a sequence of Amino Acids in a protein, start at the Amino terminus (NH2 end) and read to the Carboxyl terminus at the other (COOH).
  • The sequence of amino acids is known as the primary structure of a protein.

The amino acids in chains and proteins can be post-translationally modified – eg, disulphide bridges can form between cysteine residues.

– The Secondary Structure

Assuming the following:

  1. No rotation occurs round the peptide bond (as it is partly double bonded in nature).
  2. The chain of amino acids form a rhythmical structure – forming a repeating pattern.
  3. That the maximum number of interactions from Hydrogen bonding possible are occuring, independant of the type of residue (amino acid).

Now to explain these points:

  1. As mentioned, the C-N bond is partly double bonded and so does not rotate. The bond length of a normal C-N bond is 1.49Å (angstroms, click here for more info), while the length of a normal C=N bond is 1.28Å. The length of the peptide bond is between these, at 1.28Å.
    This is due to the C-N bond resonating between single and double bonded forms, as shown above.
  2. Two different folding points exist. These are called phi and psi. A perfect helix structure (covered later) needs both phi (Φ) and psi (Ψ) to be at an angle of about -60 degrees.
  3. Hydrogen bonds occur between the C=O and H-N of other amino acids. In α helixes, the C=O: would form a hydrogen bond to the N-H 4 residues ahead in the spiral (directly above).

The attachment of Amino Acids to tRNA – Aminoacylation

  1. First the Amino Acid must be activated. This involves the addition of ATP (adenosine triphosphate), forming Aminoacyl Adenylate.
  2. Once the amino acid has been activated it can be attached to the tRNA. This follows the following scheme:
    Aminoacyl Adenylate + tRNA –> Aminoacyl-tRNA + AMP.
    See the following image from, showing the structure of a tRNA molecule with amino acid attached. Note at the bottom is the mRNA strand.

– tRNA

As we know, tRNA is an adapter molecule that carries amino acids in an activated form to ribosomes for protein synthesis.

There is at least 1 tRNA molecule for each of the 20 amino acids.

It adopts a folding structure with internal base pairing and is about 75 nucleotides long.

Translation – RNA –> Proteins

Proteins are polymers (polypeptides – aka monomers joined by peptide bonds) of amino acids, of which there are 20 which occur naturally.

They are synthesised in the cytoplasm on ribosomes which decode the mRNA in the 5′–>3′ direction.

Most proteins contain between 100 and 400 amino acids, and as the order of amino acids per protein can be different there are 20^100 to 20^400 possible stuctures.

The average amino acid has a molecular mass of 110, so using this we can estimate the mass of different proteins by multiplying the average mass by the number of amino acids in the protein – eg. a 400 amino acid protein has an estimated molecular weight of 44000.

It is estimated that there are 10^7 or 10^8 different proteins in nature.

– Amino Acids

The 20 different amino acids are:

Single Letter Code Short Name Name
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylalanine
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine (START*)
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptophan
Y Tyr Tyrosine

*Met (Methionine) is also a start signal in translation for Eukaryotic cells. When the codon for Met is read (AUG), translation begins. Met is often removed or altered once translation has been completed. The START codon is different in Prokaryotes, possibly GUG – valine.

Each protein is coded for by 3 bases – called a triplet. Since there are 4 bases in total, of which 3 can be chosen there are 4^3 possible combinations – 64.

Here’s the triplet codes for each Amino Acid in most cells:

If you’d like the file this screenshot came from: Amino Acid Codes

These codes are almost universal, with the exception of a few types of cell. These include Human Mitochondria, where there are several triplet changes – such as UGA coding for Trp rather than STOP and AUA coding for Met instead of Ile.

– tRNA and Codon Triplets

  • Amino acids are linked to an adapter molecule of tRNA. This forms an anticodon which will match a codon on the mRNA.
  • The amino acid is bonded to the 3′ end of the complementary tRNA strand.
  • Essentially, anticodons come in when they match the mRNA strand and are then removed, leaving an amino acid completemtary to the codon.
  • This is repeated over and over to form a chain of amino acids until a stop codon is reached (UAA, UAG or UGA) and the completed polypeptide chain is released.

To explain this better I’ve found this animation. This is not my work, rather that of the American Society for Microbiology. If found it on their page here. Click here to watch the translation in bacterial cells video.

– Mutations caused by Errors

  • Wild Type = Normal Sequence
  • Miss-sense = One base changed, resulting in the sequence coding for a different Amino Acid.
  • Non-sense = One or more bases changed, resulting in termination of chain.
  • Silent = One of more base changes but the same amino acid coded for.
  • Frameshift Base Deletion = One base removed, resulting in the change of most of the following amino acids.

Transcription – DNA –> RNA

RNA is much the same as DNA, except for a few points:

  1. The sugar is ribose rather than deoxyribose – deoxyribose has one fewer OH group – on C2:
  2. The DNA base Thymine is replaced with Uracil (same but without methyl group):
  3. RNA is single stranded rather than double stranded like DNA. Instead, it folds into well defined structures (rather than combining two seperate strands that can be broken apart by denaturing).

There are several different types of RNA:

  • mRNA – Messenger RNA – template for protein synthesis.
  • rRNA – Ribosomal RNA – major component of ribosomes.
  • tRNA – Transfer RNA – carries activated amino acids to ribosomes.
  • snRNA – participates in RNA splicing.
  • miRNA – binds to mRNA and inhibits translation.
  • siRNA – Small Interfering RNA – binds to mRNA and promotes degradation.

– The Process of Transcription

A strand of RNA is produced from a strand of DNA – much the same as during DNA replication but in this case it is catalysed by RNA polymerase using rNTPs (ribonucleotide triphosphates). No primer is required. The synthesis occurs in the same direction as for DNA replication (5′->3′) and pyrophosphates are still released when the ribonucleotide triphosphates bind to the backbone.

  • ~17 base pairs of DNA duplex uncovered at a time as the DNA is trancribed in RNA. Of those ~17 base pairs, only 9 are paired with RNA at any one time.
  • The transcription ‘bubble’ moves down the DNA strand 3′–>5′ at a rate of ~50 bases/sec until it reaches a termination sequence.
  • In prokaryotes, transcription AND translation occur at the same time.

– The Control of Transcription

The interactions between RNA polymerase and its promoter can be enhanced by activators or blocked by repressors.

A good example is the lac operon in prokaryotes – in Eukaryotes this is much more complex and may require chromatin remodelling to allow access to genes for transcription.

The Lac Operon controls expression of genes related and involved in the metabolism of lactose. A regulatory gene leads to the production of a repressor protein, which (in the absense of lactose) will bind to the operator gene, blocking expression of the later genes. When lactose appears, this disables the repressor protein, changing it’s active site so that it can no longer bind to the Operator gene. This allows expression of the genes further along the strand.

The above diagram shows the events when (a) no lactose is present, and (b) when lactose is present. The diagram below shows what occurs in the Tryptophan operon – you’ll see it is very similar.

– RNA Splicing

Splicing removes non-coding RNA sections from the newly synthesised strand. I mentioned non-coding DNA previously as DNA that has a purely structural role and does not code for any proteins etc. When it is copied into RNA during translation it has no further use and so is removed by splicing.

  • A non-coding segment is called an INTRON (for intragenic regions). These sites start with GU and end with AG.
  • A coding segment is called an EXON (for regions that will be expressed).

By removing non-coding segments, several proteins can be synthesised by just one gene.

Incorrect splicing is a high risk though, and up to 15% of all genetic diseases have been caused by errors and mutations during splicing.

DNA Sequencing and Amplification

  • DNA can be sequenced by replicating with a dideoxynucleotide triphosphate – that is a deoxynucleotide triphosphate with no OH group on Carbon 3 of the sugar. This is where a phosphate group would normally bind as part of a DNA backbone – and as this is no longeran option the replication stops when this dideoxynucleotide is added.
    (The small difference between deoxynucleotides and dideoxynucleotides)
  • I’ll try to explain further. If you were looking for all the adenine positions in a DNA strand, you would add ddATP (dideoxyadenine triphosphate) which would cut the replicated strands in different adenine positions.

5′-TCAAGTTACCGTAATA (correct, using dATP)
—- Possible Outcomes —-

This would leave you with a DNA mixture containing different size DNA fragments, all cut after an Adenine residue. To assess the location of these cuts:

  • Denature the dsDNA (double stranded DNA) – this unpairs the new (fragmented) strands from the old DNA strand.
  • Seperate the DNA by polyacrylamide gel electrophoresis (or use agarose gel).
  • Smaller DNA fragments will travel further, faster than larger DNA fragments, and these fragments will be visible after the addition of a florescent chemical.

Now you know the location of Adenine bases, you can repeat the above with ddGTP, ddTTP or ddCTP, revealing the locations of those bases. You could then compare the seperate gels to work out the DNA sequence. To compare, you would need to run on the same gel:

The -ve control would either be clean or just the original DNA strand – to be used as a reference. The +ve would contain a mix of all of the mixtures.

Then it’s a simple case of looking through the different bands. Remember that towards the top are larger fragments and smaller fragments at the bottom – so you read the sequence from the bottom.

CACTCAGTGATG – and the final top strand is the full DNA strand.

– Amplifying DNA – PCR

To amplify a sample of DNA (polymerase chain reaction):

  1. Denature the double stranded DNA sample to leave single stranded DNA (heat).
  2. Add short primers that a complementary to the ends of the sequences of interest.
  3. Lower temperature and anneal.
  4. Use a thermostable DNA polymerase (a polymerase stable under heat) such as Taq polymerase to extend from the primers.
  5. Denature sample again and repeat from 3.

Repeated, this produces exponential amplification of a DNA sample – eg 40 repeats gives 2^40 amplification! This is assuming good conditions with thermostable DNA polymerase and presence of enough dNTP’s (deoxynucleotide triphosphates) and RNA primers.