Category Archives: Biology

Proteins – Stabilising Forces

There are several different types of forces acting on/within a protein molecule. These include:

  1. Covalent Bonds:
    1. Peptide bonds between Amino Acids (C-N). Can be broken down into individual amino acids by hydrolysis with 6M acid/alkali, or by proteases/proteolytic enzymes.
    2. Disulphide bridges form between cysteine to form cystine. (Cysteine has -SH which forms disulphide bridge -S-S- with another HS-). Bridges are broken down by reduction with β-mercaptoethanol to form cysteines once again.
  2. Non-Covalent Forces/Bonds:
    1. Hydrogen Bonds – these bonds are throughout the protein. The bonds in the middle of the protein structure contribute most to stability as they are furthest away from water (which would disrupt them). These can also be disrupted by heat.
    2. Van Der Waals forces/interactions – short range dipole-dipole (δ+ & δ-) interactions between close atoms. Easily disrupted by heat or denaturing agents.
    3. π-π overlap – π electron clouds delocalised over rings & bonds. Are disrupted by heat.
    4. Electrostatic bonds, Ionic interactions and Salt bridges between residues. All broken by changes in pH or high ionic strength. (Eg, positive residues include Lys, Arg, His while negative residues include Asp, Glu, Tyr & Cys).

– Zwitterions

Zwitterions are amino acids in free solution that are doubly charged. Their net charge will depend on the pH of the solution. Each amino acid has an isoelectric point at which it has no net charge.

Below the isoelectric point (also known as pI), they have a net positive (+ve) charge and above the pl they have a net negative (-ve) charge.

When amino acids become part of a polypeptide/protein, they lose their NH2 and OH groups so only the side chains can carry charges.

Proteins themselves can have isoelectronic points – and this will depend on the number and type of different amino acid residues.

– Hydrophobic Interactions

This is the prime driving force for protein folding (AKA hydrophobic collapse).

Essentially the protein chain will fold in such a way as to minimise the exposure of hydrophobic residues within the chain. This leads to the residues with hydrophilic (polar) side chains being situated on the outside of the molecule.

Proteins – Tertiary Structures

There are two notable tertiary structures – α (ALPHA) helix and BETA pleated sheet.

α Helix

  • Right handed helix much like that of a DNA helix.
  • Each amino acid side chain (R group) is 100 degrees relative to the last side chain, outside of the helix. This means there are 3.6 residues per turn and 5.4 angstroms per turn/level. On the sketch below, each R stands for a different amino acid side chain.

A couple of alterations:

  • Glycine residues will disrupt the α helix as it has no chiral carbon. The lack of a chiral carbon in Glycine makes it very flexible.
  • Proline has a cyclic side chain which restricts the rotation of phi to ~50°. There is also no H atom on the N end of the amino acid so Hydrogen bonding does not occur between residues.

Amphipathic Helices:

  • Helixes can end up with hydrophobic residues on one side and polar (hydrophilic) on the other – essentially giving the helix two faces. The image below illustrates R1, R4, R7 and R8 as hydrophobic, and R2, R4, R5, and R6 as hydrophilic.
  • This means helices can be constructed to generate lipid (hydrophobic) or water (hydrophilic) soluble proteins.

– β Pleated Sheet

There are two types of pleated sheet – Parallel and Anti-Parallel.

  • Parallel sheet has successive polypeptide strands in the same direction.
  • Anti-Parallel sheet has successive polypeptide strands in opposite directions.

These strands are typically 5-10 amino acids long, and the pleated sheet is formed by a continuous series twisted into these strands.

It has been suggested that the anti-parallel configuration is more stable.

Proteins – Primary & Secondary Structures

As mentioned a couple of posts ago:

  • Proteins are polypeptides made from 20 different monomers.
  • On average contain 100-400 monomers.
  • Each monomer has an approximate molecular mass of 110.

– Monomers –> Polymers. The Primary Structure.

  • Amino Acids form peptide bonds (from the carboxylic acid group on one to the amine group on another). This releases water in a condensation reaction. The location of the peptide bond (C-N) is shown below outlined in RED.
  • When reading a sequence of Amino Acids in a protein, start at the Amino terminus (NH2 end) and read to the Carboxyl terminus at the other (COOH).
  • The sequence of amino acids is known as the primary structure of a protein.

The amino acids in chains and proteins can be post-translationally modified – eg, disulphide bridges can form between cysteine residues.

– The Secondary Structure

Assuming the following:

  1. No rotation occurs round the peptide bond (as it is partly double bonded in nature).
  2. The chain of amino acids form a rhythmical structure – forming a repeating pattern.
  3. That the maximum number of interactions from Hydrogen bonding possible are occuring, independant of the type of residue (amino acid).

Now to explain these points:

  1. As mentioned, the C-N bond is partly double bonded and so does not rotate. The bond length of a normal C-N bond is 1.49Å (angstroms, click here for more info), while the length of a normal C=N bond is 1.28Å. The length of the peptide bond is between these, at 1.28Å.
    This is due to the C-N bond resonating between single and double bonded forms, as shown above.
  2. Two different folding points exist. These are called phi and psi. A perfect helix structure (covered later) needs both phi (Φ) and psi (Ψ) to be at an angle of about -60 degrees.
  3. Hydrogen bonds occur between the C=O and H-N of other amino acids. In α helixes, the C=O: would form a hydrogen bond to the N-H 4 residues ahead in the spiral (directly above).

The attachment of Amino Acids to tRNA – Aminoacylation

  1. First the Amino Acid must be activated. This involves the addition of ATP (adenosine triphosphate), forming Aminoacyl Adenylate.
  2. Once the amino acid has been activated it can be attached to the tRNA. This follows the following scheme:
    Aminoacyl Adenylate + tRNA –> Aminoacyl-tRNA + AMP.
    See the following image from, showing the structure of a tRNA molecule with amino acid attached. Note at the bottom is the mRNA strand.

– tRNA

As we know, tRNA is an adapter molecule that carries amino acids in an activated form to ribosomes for protein synthesis.

There is at least 1 tRNA molecule for each of the 20 amino acids.

It adopts a folding structure with internal base pairing and is about 75 nucleotides long.

Translation – RNA –> Proteins

Proteins are polymers (polypeptides – aka monomers joined by peptide bonds) of amino acids, of which there are 20 which occur naturally.

They are synthesised in the cytoplasm on ribosomes which decode the mRNA in the 5′–>3′ direction.

Most proteins contain between 100 and 400 amino acids, and as the order of amino acids per protein can be different there are 20^100 to 20^400 possible stuctures.

The average amino acid has a molecular mass of 110, so using this we can estimate the mass of different proteins by multiplying the average mass by the number of amino acids in the protein – eg. a 400 amino acid protein has an estimated molecular weight of 44000.

It is estimated that there are 10^7 or 10^8 different proteins in nature.

– Amino Acids

The 20 different amino acids are:

Single Letter Code Short Name Name
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylalanine
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine (START*)
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptophan
Y Tyr Tyrosine

*Met (Methionine) is also a start signal in translation for Eukaryotic cells. When the codon for Met is read (AUG), translation begins. Met is often removed or altered once translation has been completed. The START codon is different in Prokaryotes, possibly GUG – valine.

Each protein is coded for by 3 bases – called a triplet. Since there are 4 bases in total, of which 3 can be chosen there are 4^3 possible combinations – 64.

Here’s the triplet codes for each Amino Acid in most cells:

If you’d like the file this screenshot came from: Amino Acid Codes

These codes are almost universal, with the exception of a few types of cell. These include Human Mitochondria, where there are several triplet changes – such as UGA coding for Trp rather than STOP and AUA coding for Met instead of Ile.

– tRNA and Codon Triplets

  • Amino acids are linked to an adapter molecule of tRNA. This forms an anticodon which will match a codon on the mRNA.
  • The amino acid is bonded to the 3′ end of the complementary tRNA strand.
  • Essentially, anticodons come in when they match the mRNA strand and are then removed, leaving an amino acid completemtary to the codon.
  • This is repeated over and over to form a chain of amino acids until a stop codon is reached (UAA, UAG or UGA) and the completed polypeptide chain is released.

To explain this better I’ve found this animation. This is not my work, rather that of the American Society for Microbiology. If found it on their page here. Click here to watch the translation in bacterial cells video.

– Mutations caused by Errors

  • Wild Type = Normal Sequence
  • Miss-sense = One base changed, resulting in the sequence coding for a different Amino Acid.
  • Non-sense = One or more bases changed, resulting in termination of chain.
  • Silent = One of more base changes but the same amino acid coded for.
  • Frameshift Base Deletion = One base removed, resulting in the change of most of the following amino acids.

Transcription – DNA –> RNA

RNA is much the same as DNA, except for a few points:

  1. The sugar is ribose rather than deoxyribose – deoxyribose has one fewer OH group – on C2:
  2. The DNA base Thymine is replaced with Uracil (same but without methyl group):
  3. RNA is single stranded rather than double stranded like DNA. Instead, it folds into well defined structures (rather than combining two seperate strands that can be broken apart by denaturing).

There are several different types of RNA:

  • mRNA – Messenger RNA – template for protein synthesis.
  • rRNA – Ribosomal RNA – major component of ribosomes.
  • tRNA – Transfer RNA – carries activated amino acids to ribosomes.
  • snRNA – participates in RNA splicing.
  • miRNA – binds to mRNA and inhibits translation.
  • siRNA – Small Interfering RNA – binds to mRNA and promotes degradation.

– The Process of Transcription

A strand of RNA is produced from a strand of DNA – much the same as during DNA replication but in this case it is catalysed by RNA polymerase using rNTPs (ribonucleotide triphosphates). No primer is required. The synthesis occurs in the same direction as for DNA replication (5′->3′) and pyrophosphates are still released when the ribonucleotide triphosphates bind to the backbone.

  • ~17 base pairs of DNA duplex uncovered at a time as the DNA is trancribed in RNA. Of those ~17 base pairs, only 9 are paired with RNA at any one time.
  • The transcription ‘bubble’ moves down the DNA strand 3′–>5′ at a rate of ~50 bases/sec until it reaches a termination sequence.
  • In prokaryotes, transcription AND translation occur at the same time.

– The Control of Transcription

The interactions between RNA polymerase and its promoter can be enhanced by activators or blocked by repressors.

A good example is the lac operon in prokaryotes – in Eukaryotes this is much more complex and may require chromatin remodelling to allow access to genes for transcription.

The Lac Operon controls expression of genes related and involved in the metabolism of lactose. A regulatory gene leads to the production of a repressor protein, which (in the absense of lactose) will bind to the operator gene, blocking expression of the later genes. When lactose appears, this disables the repressor protein, changing it’s active site so that it can no longer bind to the Operator gene. This allows expression of the genes further along the strand.

The above diagram shows the events when (a) no lactose is present, and (b) when lactose is present. The diagram below shows what occurs in the Tryptophan operon – you’ll see it is very similar.

– RNA Splicing

Splicing removes non-coding RNA sections from the newly synthesised strand. I mentioned non-coding DNA previously as DNA that has a purely structural role and does not code for any proteins etc. When it is copied into RNA during translation it has no further use and so is removed by splicing.

  • A non-coding segment is called an INTRON (for intragenic regions). These sites start with GU and end with AG.
  • A coding segment is called an EXON (for regions that will be expressed).

By removing non-coding segments, several proteins can be synthesised by just one gene.

Incorrect splicing is a high risk though, and up to 15% of all genetic diseases have been caused by errors and mutations during splicing.

DNA Sequencing and Amplification

  • DNA can be sequenced by replicating with a dideoxynucleotide triphosphate – that is a deoxynucleotide triphosphate with no OH group on Carbon 3 of the sugar. This is where a phosphate group would normally bind as part of a DNA backbone – and as this is no longeran option the replication stops when this dideoxynucleotide is added.
    (The small difference between deoxynucleotides and dideoxynucleotides)
  • I’ll try to explain further. If you were looking for all the adenine positions in a DNA strand, you would add ddATP (dideoxyadenine triphosphate) which would cut the replicated strands in different adenine positions.

5′-TCAAGTTACCGTAATA (correct, using dATP)
—- Possible Outcomes —-

This would leave you with a DNA mixture containing different size DNA fragments, all cut after an Adenine residue. To assess the location of these cuts:

  • Denature the dsDNA (double stranded DNA) – this unpairs the new (fragmented) strands from the old DNA strand.
  • Seperate the DNA by polyacrylamide gel electrophoresis (or use agarose gel).
  • Smaller DNA fragments will travel further, faster than larger DNA fragments, and these fragments will be visible after the addition of a florescent chemical.

Now you know the location of Adenine bases, you can repeat the above with ddGTP, ddTTP or ddCTP, revealing the locations of those bases. You could then compare the seperate gels to work out the DNA sequence. To compare, you would need to run on the same gel:

The -ve control would either be clean or just the original DNA strand – to be used as a reference. The +ve would contain a mix of all of the mixtures.

Then it’s a simple case of looking through the different bands. Remember that towards the top are larger fragments and smaller fragments at the bottom – so you read the sequence from the bottom.

CACTCAGTGATG – and the final top strand is the full DNA strand.

– Amplifying DNA – PCR

To amplify a sample of DNA (polymerase chain reaction):

  1. Denature the double stranded DNA sample to leave single stranded DNA (heat).
  2. Add short primers that a complementary to the ends of the sequences of interest.
  3. Lower temperature and anneal.
  4. Use a thermostable DNA polymerase (a polymerase stable under heat) such as Taq polymerase to extend from the primers.
  5. Denature sample again and repeat from 3.

Repeated, this produces exponential amplification of a DNA sample – eg 40 repeats gives 2^40 amplification! This is assuming good conditions with thermostable DNA polymerase and presence of enough dNTP’s (deoxynucleotide triphosphates) and RNA primers.

DNA Replication

  • DNA is copied semi-conservatively. This means that each old strand of DNA pairs with a strand made from new nucleotides.
  • Replication starts at a fixed point and is bidirectional (replicates in both directions). In Eukaryotic DNA, there are multiple replication forks. Eg. E.Coli:
  • The DNA duplex is opened up and nucleotides read 3′-5′ on the OLD strand.. Eg.
  • This means DNA Polymerases synthesise DNA in the 5′-3′ direction.
  • Replication starts from an existing primer. (A primer is a small oglionucleotide sequence that has been made by Primase.)
  • The addition of a nucleotide to the strand involves the removal of 2 phosphate groups from a deoxynucleotide triphosphate as only 1 phosphate is needed for the backbone. This means the addition units (the deoxynucleotide triphosphates) leave behind a 2 phosphate complex known as a pyrophosphate.

The deoxynucleotide is the nucleotide that attaches to the DNA chain below. The deoxynucleotide molecule can also be called a deoxynucleotide monophosphate, for obvious reasons.

I found the above image on the website of the Chemistry & Biochemistry department at the University of Texas at Austin, US. It shows the old strand (blue) unzipping and then new strands binding to this and forming.

Q: “I thought you said it only synthesised in the 5′-3′ direction! How can both new strands be formed especially when the one on the left seems to be going 3′ -5′?”

A: Easy. It still synthesises 5′-3′, it just synthesises in chunks. Hopefully the image below will explain:

As the double DNA strand unzips, the leading prime is free to synthesise a new chain directly in the 5′-3′ direction. That’s how the DNA polymerase works.

But as it can’t synthesise DNA in the 3′-5′ direction it instead synthesises short 5′-3′ fragments for the lagging phase – these are called okazaki fragments and are later joined by DNA ligase.

DNA Structure

DNA is a polymer with a Sugar-Phosphate repeating backbone. Each sugar has a nitrogenous organic base attached (either Adenine, Thymine, Cytosine or Guanine).

Two Deoxyribose Sugars attached by Phosphate Group - DNA Backbone

DNA backbone. You’ve got a deoxyribose sugar attached to a phosphate group & your sugar attached to a base via a BETA-glycosidic bond (eg forms via condensation reaction). Next to look at the bases!

There are two types of bases – Pyrimidine and Purine. Purines are BIGGER with 2 rings (imagine the bigger work being a smaller molecule).

– Pyrimidines (single ring) – T and C & Purines (double ring) – A & G

  • A and T form 2 hydrogen bonds while C and G form 3.
  • The number of H bonds is important as this will determine the strength of the DNA. DNA with a high CG content will have a higher decomposition temperature than DNA with a high AT content because it has a larger number of H bonds. These H bonds hold the helix together better, requiring more energy to break the helix apart.

A little terminology:

Sugar + Base = Nucleoside

Nucleoside + Phosphate = Nucleotide

A deoxynucleoside is a deoxysugar + base.

An oglionucleotide is a polymer of repeating nucleotides in a chain with less than 20 repeat units.

A polynucleotide is a longer chain of repeating units.

– Structure of a DNA Helix

  • Bases are hydrophobic and stack on top of each other in the helix.
  • Phosphates are on the outside – so the helix is highly -vely charged.
  • DNA helix has 2 strands – these run ANTI-PARALLEL. Eg.
  • It’s a right handed helix. Imagine a screw – spirals look like they are going down clockwise.

  • Remember that direction! I’ve drawn an extra white arrow going up on the second picture to show what the 5′—->3′ does.

  • This is also known as B-DNA.

– Other (more rare) helix structures

  • A-DNA. Helix follows same direction but the bases are pulled away from the centre of the helix nearer to the backbone. This happens when the DNA is dehydrated.
  • Z-DNA. A normal helix…but backwards! The DNA strands are wound the opposite way, meaning it appears mirrored to normal B-DNA. This occurs in some GC containing sequences at high salt concentrations.

– Packing down the DNA Helix to more condensed structures (eg. Chromatin)

  • Although the DNA helix is a fairly compact structure, this can be further wound to reduce it’s size. Imagine a helix is like a coil, then winding that coil round and round a roll…this picture borrowed from Wikipedia explains visually.
    (Click picture to enlarge in a new window/tab)
  • Chromatin contains 5 main proteins (H1, H2A, H2B, H3, H4) which are very basic (AKA contain lots of +vely charged amino acids like arginine and lysine) and which interact with the -vely changed DNA (due to the phosphate groups as explained previously).
  • 2x (H2A, H2B, H3) form the round structure you see in the second box of the image above. The DNA helix wraps around this 1.6 times before wrapping round another cylindrical structure.
  • Essentially the DNA is arranged around the histone cores like beads on a string which further packs down to form higher order structures such as chromosomes.

DNA – Introduction to Deoxyribonucleic Acid

Simply, DNA –(transcription)–> RNA –(translation)–> Proteins. (DNA is the template for protein)

Proteins are useful for structure, metabolism, organisation and development…quite essential. What the protein does is dependant on the genes activated and the tissue it is being produced in. A few examples include Keratin which is part of hair (a fibrous protein) and Haemoglobin which is found in red blood cells and binds to Oxygen (a globular protein).

How do we know DNA is the genetic material…and not protein?

In 1928, Frederick Griffith ran experiments with Streptococcus Pneumoniae where he demonstrated how harmless strains could be turned into virulent, harmful strains. He did this by mixing heat killed (therefore protein structure broken) virulent bacteria with live non-virulent forms; resulting in a permanent transformation to a virulant form.

Virulent mixed with non-virulant bacteria

Virulent mixed with non-virulent bacteria

A later discovery in 1944 by Avery was that it was DNA from the virulent cells – not the protein – as DNA is only destroyed by DNase and not by proteases or RNase…which would destroy protein, leaving only DNA.

Later experiments further confirmed this

1. DNA Deoxyribonucleic Acid.

  • Discovered in 1869 in the cell nucleus of Eukaryotes. In Prokaryotes (eg Bacteria) it is not membrane bound and so is ‘loose’ in the cell.
  • Strands of DNA form Chromosomes in Eukaryotes.

– Chromosomes

  • Named Chromosomes because special dyes could pick out AT and CG rich bands – allowing structures to be seen (see Fig 1).
  • Chromosomes are formed from DNA tightly wound round structural proteins such as histone proteins.
  • Cells of a species have the same number of chromosomes. (eg. Humans have 23 pairs in every diploid cell, and 23 in every hapoid cell.)
  • All cells apart from gametes (sex cells) are diploid…aka have a two of each chromosome (so in humans, 46 total). Gametes are haploid and only have one copy of each chromosome (in humans, 23 total).
  • A pair or set of identical chromosomes are called homologous chromosomes.
  • The full set of chromosomes (in humans, all 23) is referred to as the karyotype.

Whilst the number of genes = the complexity of species, the same rule is not true for the number of choromosomes as any particular chromosome may have a greater amount of DNA in it, and so any number of genes on it.

  • Also worth noting is that DNA contains sections of junk DNA which has a purely structural role – in humans only 9-27% is coding DNA (so only 9-27% can be used to synthesise proteins, the rest of the DNA strand is there to make sure the DNA can be stored correctly in chromosomes etc).

Fig 1 - high resolution image of a Chromosome showing different areas.

A Chromosome is made up of 2 chromatids joined at the centromere (also the point where spindle fibres attach during cell replication & division). There are 46 chromosomes (23 pairs) in a human; as shown.

Fig 2 - Showing Karyotype of Human Male

Note how there is no chromosome pair 23 on Fig 2. This is because the X and Y (or the sex chromosomes) form the final pair. Having 1 X and 1 Y chromosome results in a Male while 2 X chromosomes result in a female.

Offsite reading:

  1. – More detailed history of DNA discovery from the European Bioinformatics Institute.