Crystal Structure Studies

Crystalline solids are composed of regular repeating patterns over relatively long distances ( >1000 Å). This is known as having a long range order.

ONLY crystalline solids produce macroscopic crystals (visible crystals).

Crystals have flat faces which can vary in size from crystal to crystal. The angle between similar faces is constant, and they break (cleave) in preferred directions (a feature exploited by diamond cutters).

Crystal Structure Studies tell us:

  • Chemical Characteristics
    • How are atoms connected?
    • What are the bond lengths and angles between them?
    • What does this say about the bonding?
  • Inorganic Solids
    • Structural features which may lead to an important property.
    • How changing a metal coordination sphere may alter the property (eg. in superconductors and electrical materials).
  • Organic Compounds
    • Which points of stereochemistry are important.
    • How solubility is affected by the way the molecules pack together.
    • How many ways the molecules can pack together.
  • Solution vs Solid State Structure
    • Structures may differ between different phases.

A couple of things about Molecule Packing:

  • Atoms, ions or molecules always try to pack into the lowest energy configuration.
  • This configuration can then be repeated for a large number of units.
  • As the configuration is repeated, a regular pattern forms and a lattice emerges through the crystalline as a whole.
  • This pattern may interact with certain wavelengths of radiation and lead to diffraction (constructive interference) which provides a means of studying the pattern.

3D Lattices and Translations – Unit Cells:

We can use three translations (a, b and c) to illustrate distance between atoms, ions or molecules; and three angles between each of these translations (α, β and γ). These parameters define the size and shape of the unit cell.

*Note that angle α corresponds to the translation a, angle β to translation b…etc.*

The unit cell is:

  • A small volume defined by 6 faces, consisting of 3 identical pairs.
  • A small unit of the larger structure which is then repeated.

“The smallest repeating unit that shows the full symmetry of the crystal structure.”

There are 7 Crystal Systems

  • Triclinic – a≠b≠c – α≠β≠γ (ALL Different!)
  • Monoclinic – a≠b≠c – α=γ=90°, β
  • Orthohombic – a≠b≠c – α=β=γ=90°
  • Tetragonal – a=b≠c – α=β=γ=90°
  • Hexagonal – a=b≠c – α=β=90°, γ=120° (Need a and c)
  • Rhombohedral – a=b=c – α=β=γ≠90°
  • Cubic – a=b=c – α=β=γ=90° (ALL Equal, Only need a)

Let’s look at some unit cells. Try and work out the number of atoms in each 2D cell:

You should have:
i) 1
ii) 1
iii) 2

Why? This is something we’ll explore a little more later, but for now consider this. i) has 4 dots (atoms, ions or molecules) in it’s shape. Each of these four dots can participate in a total of four unit cells.

Look at the red dot. It is shared between four unit cells and so only contributes 1/4 to the contents of that cell. The same can be said for each other dot at the corner of a cell, as we assume that it is only attributing 1/4.

So we know that i) has 4 corners, each worth 1/4. That totals 1!

ii) is just the same, but iii) is different. It has 4 corners each worth 1/4 BUT it also has a single dot in the middle that is not shared with any other cell…meaning it is worth a whole 1. So (4×1/4)+1 = 2.

– Bravais Lattices

Although there may be an infinite number of chemical structures, there are only 14 3D lattice types, known as Bravais lattices.

We’re just going to look at the cubic system to start with. Remember, a=b=c – α=β=γ=90°. In the cubic system there are 3 lattice types, P (primitive), I (body centered), and F (face centered).

  • Primitive cubic cells contain only corner dots, which can be shared between 8 different unit cells, meaning each dot is worth 1/8. 8*(1/8)  is 1 lattice point.
  • Body Centered (I) cubic cells have the primitive structure but feature a central dot, worth 1 (as it is exclusive to that cell.  This gives 8*(1/8) + 1 = 2 lattice points.
  • Face Centered (F) cubic cells once again follow the primitive structure but this time feature a dot on each of the 6 faces. Each of these dots can be shared between two cells and so each is worth 1/2. This gives 8*(1/8) + 6*(1/2) = 1+3 = 4 lattice points.

So, when counting particles in a unit cell:

  • An atom at a corner counts for 1/8
  • An atom on an edge counts for 1/4
  • An atom on a Face counts for 1/2
  • An atom within the cell counts for 1

Technically this is only true for cells with 90 degree angles, but it does actually work for all cells.

– Projection Drawings:

Although we could draw the 3D shape of a cell every time, it is easier to draw a projection – a top down sketch of the cell.

Each atom’s height within the cell is indicated as a fraction of the cell height. Note the cell heights in the drawing below.

Here are the projection drawings for each cubic lattice:

– Packing Densities:

To find the structure offering the maximum density of atoms, we can calculate the atomic volume and packing density per unit cell. This involves some fairly basic maths which I have recapped here. We’ll start with Primitive cubic cell.

Volume of cubes = a³ (a*a*a)

Volume of a Sphere/Atom = 4/3 x π x r³
(r = atomic radius)

Packing Density = n x p / a³
(n = number of lattice points per cell, p = Volume of Atom)

Primitive:

There is 1 lattice point per cell – and the atomic radius is half the distance between the closest atom points. In this case, r = 1/2a.

Therefore:

  • Volume of atom = 4/3 x 3.142 x (1/2a)³ = 0.524a³

The Packing Density of these atoms is then simply the volume of one atom (which we have found to be 0.524 a³) divided by the volume of the cube (which is a³):

  • Packing Density = (0.524a³) / a³ = 0.524

Body Centered:

There are 2 lattice points per cell, and in this case the radius is not simply 1/2a as the central atom is the closest point. More trigonometry!

The distance between the 2 corner points is √3a, as shown in the first image. Thus the distance between the two closest atoms is half of this…√3a/2. This means the atomic radii must be √3a/4.

Therefore:

  • Volume of Atom = 4/3 x 3.142 x (√3a/4)³ = 0.34a³
  • Packing Density = (2 x 0.34a³) / a³ = 0.68

(Note the doubling of the atomic volume in the packing density calculation – this is because Body Centered cubes contain 2 lattice points.)

Face Centered:

4 lattice points per cell, with more trigonometry.

The distance between a corner atom and the corner atom diagonally across the same face is √2a. This means the distance between a corner atom and a face atom is half of this, √2a/2. This means the atomic radii must be √2a/4.

Therefore:

  • Volume of Atom = 4/3 x 3.142 x (√2a/4)³ = 0.185a³
  • Packing Density = (4 x 0.185a³) / a³ = 0.74

Summary:

Primitive Density (P) = 0.524 = 52.4%

Body Density (I) = 0.68 = 68%

Face Density (F) = 0.741 = 74.1%

You can see the density increasing as you move down the list.

– Using this knowledge

  1. Finding the metallic radius of α-tungsten. We know the unit cell is cubic (so we can use what’s above), a = 3.15Å, Atomic Mass = 183.85g mol^-1 and it’s measured density is 19.25g cm^-3 near room temperature and pressure. Step by step:
    1. We need to calculate the number of atoms in the unit cell (the number of lattice points). We know:
      Density = (Mass of 1 atom x Number of Atoms in Cell) / Volume of Unit Cell
      Therefore:
      19.2 = ((183.85/6.023×10^23)* x N) / (3.15×10^-8)*^3
      and Number of Atoms = 2 (rounded to leave an integer).

      We now know that there are 2 lattice points, or atoms per unit cell. Looking above we know that this unit cell is Body Centered.

      (* Finding the mass of 1 atom is done by taking the molar mass of an element and dividing it by Avogadro’s number, AKA the number of atoms per mole.)
      (** This value has been converted into Metres from Angstroms. 1Å = 1×10^8 Metres.)

    2. Calculate the metallic radius – we have learnt that the unit cell is body centered, so we already know the distances involved.
      The atomic radius in this lattice is √3a/4, and we know that a for α-tungsten = 3.15Å. Therefore the metallic radius is (√3 x 3.15)/4 = 1.36Å.

Some Basic Trigonometry

It seems that doing maths for as long as I can remember didn’t make it stick in my mind so I’m having a little refresh as I go. I’m starting with some basic trig.

– Pythagorus’ Theorum – Find the length of a third side on a Right angled Triangle.

Very simple. Say we have a right angled triangle with sides a, b and c – where c is the hypotenuse…

Crucially, a² + b² = c²

AKA the sum of the areas of the two squares on the legs (a and b) equals the area of the square on the hypotenuse (c).

So how about this triangle?

5² + 9² = c²
25 + 81 = c²
c² = 106

Now square root 106…

c = ~10.30 (4sf) or √106 in surd form.

Protein Purification

– Protein Identification

Protein purification begins with the need to identify the protein we want to purify! There are several methods that can be used to rapidly identify the protein:

  • Enzyme Assay (by catalytic activity) – with certain enzymes we can use colorimetry to detect a product as a reaction progresses. The higher the more enzyme present, the faster the colour or light absorbance will change. An example is testing for Alcohol Dehydrogenase, which will lead to a change in the levels of NADH and NAD+ as Ethanol is converted to Ethanal. This change can be detected by colorimetry at ~340nm.
  • SDS-PAGE Electrophoresis (by size) – this method seperates protein chains by size by electrophoresis. This method denatures the proteins.
    The sample is run at the same time as a molecular mass marker sample, containing proteins of known mass. The marker sample will provide a scale for the mass of your sample. Once you’ve run the gel you will be able to plot the results as above, draw a best fit line and read off the Molecular Mass of your sample protein.
  • Immuno-Assay (by specific antibodies) – Antibodies that fit specifically to the protein you are looking for are added to the sample. When these bind to the target protein they will instigate a colour change or some other noticable change. The presence and concentration of the target protein can be assessed by the extent of the changes – if it was a colour change then the darker the colour goes, the more target enzyme must be present.
  • Western Blotting – A combination of electrophoresis and immuno-assay. The immuno-assay technique is run by electrophoresis. This will be useful if your sample contains several different proteins and you need to identify the target protein. The band of colour (or change) will show you the correct protein, and then you simply need to calculate the approximate molecular mass using the molecular mass markers.

– Protein Purification

I’ve broken the purification methods here into 8 different headers, each a physical-chemical property or biological activity.

  1. Stability (Heat). Some proteins are more heat tolerant than others and can survive heating while others denature. If your target protein is heat stable at above 60C and your contaminants are not, then simply heating your mixture to 60C for 30 minutes will denature most of the contaminents. This will leave you with a much higher concentration of your target protein in your mixture.
  2. Solubility (Seperate by pI). Proteins are least soluble at the pH equal to their isoelectronic point. When helped by the addition of salts to the solution this can lead to their precipitation. As the salt concentration increases, different proteins will precipitate.
  3. Size. Proteins can be seperated by Gel Permeation Chromatography (Gel filtration). The proteins are run through a buffered, porous, cross linked resin. While small molecules are able to fit into the pores in the resin, the larger proteins cannot and so travel ahead, with the small molecules lagging behind.
    This is due to a larger volume of buffer available to the smaller molecules, meaning more buffer must pass down the column for them to elute, compared to the relatively smaller volume of buffer required to elute the larger, excluded proteins.
  4. Density (Centrifuge). By centrifuging the sample in a test tube containing a sucrose density gradient, the centrifugal forces will force the proteins down the tube until they reach a concentration where the density of the sucrose solution is the same as their own. This level is known as it’s isopycnic level.
  5. Charge. There are several different methods of purification by charge:
    1. Gel Electrophoresis – Based on movement of a protein through a cross linked gel called polyacrylamide. This would occur at a pH where the protein has a charge (not at it’s pI). The size of the pores can be altered by changing the concentration of cross linking reagent, and the speed at which a protein travels is equal to its charge:mass ratio. (This method does not tell us anything about the protein’s molecular weight).
    2. SDS PAGE – This cannot really be used for purification because SDS detergent (Sodium Dodecylsulphate) is used which denatures the protein. It unfolds the protein and surrounds it with -ve charge sulphate groups which means all the proteins have a uniform charge:mass ratio. SDS has a 12 carbon hydrocarbon chain, and then a hydrophilic sulphate group. The Sulphate groups surrounding the protein form a miscelle.
      The sample is now allowed to run on a gel, from -ve to +ve, and as they all have the same mass to charge ratio, their rate is determined only by their size. The smaller protein molecules move faster and the larger molecules move slower through the gel.
    3. Isoelectric Focusing – Very similar to (1) above, but instead of an electric charge, there is a pH gradient along which the proteins can move until they are at a point where they  have no net charge (at their pI).
    4. Ion Exchange Chromatography. Essentially, both columns and proteins become charged at different pH’s, and by altering the pH we can hold on to some proteins while others are eluted.
      Diethylaminoethyl-Cellulose (DEAE-Cellulose) has a +ve charge below pH 9.5 wheras CarboxyMethyl-Cellulose (CM-Cellulose) has a -ve charge above pH 3.0. Therefore:
      – Proteins with a +ve charge at pH7 will bind to a column of CM-Cellulose, while
      – Proteins with a -ve charge at pH7 will bind to a column of DEAE-Cellulose.
      We can then alter the pH of the solution to release certain proteins or to pick up others. Another way of dispersing ionic interactions between the column and proteins is to increase the salt concentration.
  6. Hydrophobicity. Proteins nearly always feature hydrophobic areas or side chains and these allow the proteins to bind to resins with hydrophobic groups attached. This means the proteins can be eluted with a gradient of buffer (eg an organic solvent such as ethanol). The proteins forming the strongest interactions with the resin column will require higher concentrations of ethanol to elute.
  7. Biological Function. If a protein has a high affinity for a substrate (eg. ADH has a high affinity for NAD+) then we can use affinity chromatography. If we immobilise the substrate (eg. NAD+) then the protein will bind to that substrate, immobilising itself – allowing other proteins to run free of the column. By releasing free NAD+ throught he column the substrate will gradually release the immobilised NAD+ in favour of the free NAD+ and run free of the column.
    This method can purify a protein in one step, and works best if the protein has a high affinity for the bound ligand.
  8. Fusion Proteins. This involves the addition of a gene to a protein that essentially ‘tags’ the protein. An example would be a tag containing histidine residues, which would bind to metal ions in the column.
    Here, the imidazole rings on the histidine residues stick to the immobilised metai ions allowing other proteins to elute the column. Then, like the method above, add free imidazole to release the fusion proteins and then use a protease to cut the tag away. Run the column again and only the tags will bind, allowing the protein of interest to run free.

Proteins – Quaternary Structure & Overview

The quaternary structure of a protein involves the association of folded polypeptide chains into a mature, active protein.

  • This can be a single polypeptide chain (monomer), 2 chains (dimer), 3 chains (trimer), 4 chains (tetramer) and so on…
  • The associated chains can be identical or different.

Some quaternary structure require additional polypeptide chains (which were removed during production) in order to achieve a working protein state (eg. Mature insulin). There are also structures which will revert to their original shape once broken, as the order is set in the primary structure of Amino Acids.

With Insulin, a helper amino acid strand is used to ‘hold’ two sequences in place, allowing the formation of disulphide bridges. I’ve tried to illustrate before and after:

S is the signal chain, while B acts as a support structure during disulphide bridge formation between A and C.

– An Overview

  • Primary Structure – The sequence of Amino acids on a chain.
  • Secondary Structure – The 3D relationship between Amino Acids – leading to α helix, β pleated sheet etc.
  • Tertiary Structure – The 3D relationship between parts of the above structure.
  • Quaternary Structure – The number of and relationship between amino acid chains (seperate tertiary structures).

Proteins – Stabilising Forces

There are several different types of forces acting on/within a protein molecule. These include:

  1. Covalent Bonds:
    1. Peptide bonds between Amino Acids (C-N). Can be broken down into individual amino acids by hydrolysis with 6M acid/alkali, or by proteases/proteolytic enzymes.
    2. Disulphide bridges form between cysteine to form cystine. (Cysteine has -SH which forms disulphide bridge -S-S- with another HS-). Bridges are broken down by reduction with β-mercaptoethanol to form cysteines once again.
  2. Non-Covalent Forces/Bonds:
    1. Hydrogen Bonds – these bonds are throughout the protein. The bonds in the middle of the protein structure contribute most to stability as they are furthest away from water (which would disrupt them). These can also be disrupted by heat.
    2. Van Der Waals forces/interactions – short range dipole-dipole (δ+ & δ-) interactions between close atoms. Easily disrupted by heat or denaturing agents.
    3. π-π overlap – π electron clouds delocalised over rings & bonds. Are disrupted by heat.
    4. Electrostatic bonds, Ionic interactions and Salt bridges between residues. All broken by changes in pH or high ionic strength. (Eg, positive residues include Lys, Arg, His while negative residues include Asp, Glu, Tyr & Cys).

– Zwitterions

Zwitterions are amino acids in free solution that are doubly charged. Their net charge will depend on the pH of the solution. Each amino acid has an isoelectric point at which it has no net charge.

Below the isoelectric point (also known as pI), they have a net positive (+ve) charge and above the pl they have a net negative (-ve) charge.

When amino acids become part of a polypeptide/protein, they lose their NH2 and OH groups so only the side chains can carry charges.

Proteins themselves can have isoelectronic points – and this will depend on the number and type of different amino acid residues.

– Hydrophobic Interactions

This is the prime driving force for protein folding (AKA hydrophobic collapse).

Essentially the protein chain will fold in such a way as to minimise the exposure of hydrophobic residues within the chain. This leads to the residues with hydrophilic (polar) side chains being situated on the outside of the molecule.

Proteins – Tertiary Structures

There are two notable tertiary structures – α (ALPHA) helix and BETA pleated sheet.

α Helix

  • Right handed helix much like that of a DNA helix.
  • Each amino acid side chain (R group) is 100 degrees relative to the last side chain, outside of the helix. This means there are 3.6 residues per turn and 5.4 angstroms per turn/level. On the sketch below, each R stands for a different amino acid side chain.

A couple of alterations:

  • Glycine residues will disrupt the α helix as it has no chiral carbon. The lack of a chiral carbon in Glycine makes it very flexible.
  • Proline has a cyclic side chain which restricts the rotation of phi to ~50°. There is also no H atom on the N end of the amino acid so Hydrogen bonding does not occur between residues.

Amphipathic Helices:

  • Helixes can end up with hydrophobic residues on one side and polar (hydrophilic) on the other – essentially giving the helix two faces. The image below illustrates R1, R4, R7 and R8 as hydrophobic, and R2, R4, R5, and R6 as hydrophilic.
  • This means helices can be constructed to generate lipid (hydrophobic) or water (hydrophilic) soluble proteins.

– β Pleated Sheet

There are two types of pleated sheet – Parallel and Anti-Parallel.

  • Parallel sheet has successive polypeptide strands in the same direction.
  • Anti-Parallel sheet has successive polypeptide strands in opposite directions.

These strands are typically 5-10 amino acids long, and the pleated sheet is formed by a continuous series twisted into these strands.

It has been suggested that the anti-parallel configuration is more stable.

Proteins – Primary & Secondary Structures

As mentioned a couple of posts ago:

  • Proteins are polypeptides made from 20 different monomers.
  • On average contain 100-400 monomers.
  • Each monomer has an approximate molecular mass of 110.

– Monomers –> Polymers. The Primary Structure.

  • Amino Acids form peptide bonds (from the carboxylic acid group on one to the amine group on another). This releases water in a condensation reaction. The location of the peptide bond (C-N) is shown below outlined in RED.
  • When reading a sequence of Amino Acids in a protein, start at the Amino terminus (NH2 end) and read to the Carboxyl terminus at the other (COOH).
  • The sequence of amino acids is known as the primary structure of a protein.

The amino acids in chains and proteins can be post-translationally modified – eg, disulphide bridges can form between cysteine residues.

– The Secondary Structure

Assuming the following:

  1. No rotation occurs round the peptide bond (as it is partly double bonded in nature).
  2. The chain of amino acids form a rhythmical structure – forming a repeating pattern.
  3. That the maximum number of interactions from Hydrogen bonding possible are occuring, independant of the type of residue (amino acid).

Now to explain these points:

  1. As mentioned, the C-N bond is partly double bonded and so does not rotate. The bond length of a normal C-N bond is 1.49Å (angstroms, click here for more info), while the length of a normal C=N bond is 1.28Å. The length of the peptide bond is between these, at 1.28Å.
    This is due to the C-N bond resonating between single and double bonded forms, as shown above.
  2. Two different folding points exist. These are called phi and psi. A perfect helix structure (covered later) needs both phi (Φ) and psi (Ψ) to be at an angle of about -60 degrees.
  3. Hydrogen bonds occur between the C=O and H-N of other amino acids. In α helixes, the C=O: would form a hydrogen bond to the N-H 4 residues ahead in the spiral (directly above).

The attachment of Amino Acids to tRNA – Aminoacylation

  1. First the Amino Acid must be activated. This involves the addition of ATP (adenosine triphosphate), forming Aminoacyl Adenylate.
  2. Once the amino acid has been activated it can be attached to the tRNA. This follows the following scheme:
    Aminoacyl Adenylate + tRNA –> Aminoacyl-tRNA + AMP.
    See the following image from wiley.com, showing the structure of a tRNA molecule with amino acid attached. Note at the bottom is the mRNA strand.

– tRNA

As we know, tRNA is an adapter molecule that carries amino acids in an activated form to ribosomes for protein synthesis.

There is at least 1 tRNA molecule for each of the 20 amino acids.

It adopts a folding structure with internal base pairing and is about 75 nucleotides long.

Translation – RNA –> Proteins

Proteins are polymers (polypeptides – aka monomers joined by peptide bonds) of amino acids, of which there are 20 which occur naturally.

They are synthesised in the cytoplasm on ribosomes which decode the mRNA in the 5′–>3′ direction.

Most proteins contain between 100 and 400 amino acids, and as the order of amino acids per protein can be different there are 20^100 to 20^400 possible stuctures.

The average amino acid has a molecular mass of 110, so using this we can estimate the mass of different proteins by multiplying the average mass by the number of amino acids in the protein – eg. a 400 amino acid protein has an estimated molecular weight of 44000.

It is estimated that there are 10^7 or 10^8 different proteins in nature.

– Amino Acids

The 20 different amino acids are:

Single Letter Code Short Name Name
A Ala Alanine
C Cys Cysteine
D Asp Aspartic Acid
E Glu Glutamic Acid
F Phe Phenylalanine
G Gly Glycine
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine (START*)
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptophan
Y Tyr Tyrosine

*Met (Methionine) is also a start signal in translation for Eukaryotic cells. When the codon for Met is read (AUG), translation begins. Met is often removed or altered once translation has been completed. The START codon is different in Prokaryotes, possibly GUG – valine.

Each protein is coded for by 3 bases – called a triplet. Since there are 4 bases in total, of which 3 can be chosen there are 4^3 possible combinations – 64.

Here’s the triplet codes for each Amino Acid in most cells:

If you’d like the file this screenshot came from: Amino Acid Codes

These codes are almost universal, with the exception of a few types of cell. These include Human Mitochondria, where there are several triplet changes – such as UGA coding for Trp rather than STOP and AUA coding for Met instead of Ile.

– tRNA and Codon Triplets

  • Amino acids are linked to an adapter molecule of tRNA. This forms an anticodon which will match a codon on the mRNA.
  • The amino acid is bonded to the 3′ end of the complementary tRNA strand.
  • Essentially, anticodons come in when they match the mRNA strand and are then removed, leaving an amino acid completemtary to the codon.
  • This is repeated over and over to form a chain of amino acids until a stop codon is reached (UAA, UAG or UGA) and the completed polypeptide chain is released.

To explain this better I’ve found this animation. This is not my work, rather that of the American Society for Microbiology. If found it on their page here. Click here to watch the translation in bacterial cells video.

– Mutations caused by Errors

  • Wild Type = Normal Sequence
  • Miss-sense = One base changed, resulting in the sequence coding for a different Amino Acid.
  • Non-sense = One or more bases changed, resulting in termination of chain.
  • Silent = One of more base changes but the same amino acid coded for.
  • Frameshift Base Deletion = One base removed, resulting in the change of most of the following amino acids.

Transcription – DNA –> RNA

RNA is much the same as DNA, except for a few points:

  1. The sugar is ribose rather than deoxyribose – deoxyribose has one fewer OH group – on C2:
  2. The DNA base Thymine is replaced with Uracil (same but without methyl group):
  3. RNA is single stranded rather than double stranded like DNA. Instead, it folds into well defined structures (rather than combining two seperate strands that can be broken apart by denaturing).

There are several different types of RNA:

  • mRNA – Messenger RNA – template for protein synthesis.
  • rRNA – Ribosomal RNA – major component of ribosomes.
  • tRNA – Transfer RNA – carries activated amino acids to ribosomes.
  • snRNA – participates in RNA splicing.
  • miRNA – binds to mRNA and inhibits translation.
  • siRNA – Small Interfering RNA – binds to mRNA and promotes degradation.

– The Process of Transcription

A strand of RNA is produced from a strand of DNA – much the same as during DNA replication but in this case it is catalysed by RNA polymerase using rNTPs (ribonucleotide triphosphates). No primer is required. The synthesis occurs in the same direction as for DNA replication (5′->3′) and pyrophosphates are still released when the ribonucleotide triphosphates bind to the backbone.

  • ~17 base pairs of DNA duplex uncovered at a time as the DNA is trancribed in RNA. Of those ~17 base pairs, only 9 are paired with RNA at any one time.
  • The transcription ‘bubble’ moves down the DNA strand 3′–>5′ at a rate of ~50 bases/sec until it reaches a termination sequence.
  • In prokaryotes, transcription AND translation occur at the same time.

– The Control of Transcription

The interactions between RNA polymerase and its promoter can be enhanced by activators or blocked by repressors.

A good example is the lac operon in prokaryotes – in Eukaryotes this is much more complex and may require chromatin remodelling to allow access to genes for transcription.

The Lac Operon controls expression of genes related and involved in the metabolism of lactose. A regulatory gene leads to the production of a repressor protein, which (in the absense of lactose) will bind to the operator gene, blocking expression of the later genes. When lactose appears, this disables the repressor protein, changing it’s active site so that it can no longer bind to the Operator gene. This allows expression of the genes further along the strand.

The above diagram shows the events when (a) no lactose is present, and (b) when lactose is present. The diagram below shows what occurs in the Tryptophan operon – you’ll see it is very similar.

– RNA Splicing

Splicing removes non-coding RNA sections from the newly synthesised strand. I mentioned non-coding DNA previously as DNA that has a purely structural role and does not code for any proteins etc. When it is copied into RNA during translation it has no further use and so is removed by splicing.

  • A non-coding segment is called an INTRON (for intragenic regions). These sites start with GU and end with AG.
  • A coding segment is called an EXON (for regions that will be expressed).

By removing non-coding segments, several proteins can be synthesised by just one gene.

Incorrect splicing is a high risk though, and up to 15% of all genetic diseases have been caused by errors and mutations during splicing.