Monday 19 December 2016

Chain Termination Sequencing

The environmental DNA has been amplified, the clone library has been built, but there's still one more step before we can start working out what was in our sample. Now we need to sequence the genes which are safely tucked away in the plasmids of the clone library.

Although chain termination sequencing has been largely superseded by Next Generation Sequencing techniques like pyrosequencing and the Illumina platform, it's a good place to start. It shares some characteristics with a lot of the more modern platforms and a lot of older material about microbial communities is based on these techniques.

Dideoxy Chain Termination (Sanger) Sequencing

Frederick Sanger developed this technique in 1977, and it remained the dominant method for 39 years. In its original form it was a very labour intensive process and many a PhD student spent long hours poring over electrophoresis gels to sequence viral genomes. However, once it was automated, it provided the basis for the Human Genome project. 

The method is very similar to PCR. You need template DNA, DNA polymerase, a primer and some nucleotides, but there are a few differences. Your initial DNA template is single stranded and you need to mix in a few dideoxynucleotides (ddNTPs) with your normal deoxynucleotides (dNTPs). A ddNTP lacks the hydroxyl group on the 3' carbon, making it impossible for another dNTP to bind to it. Once I've added a ddNTP to a DNA strand, that's it. It can't continue being replicated, it just stays as it is, it's been terminated.

Figure 1 - The structure of dNTPs (left) and ddNTPs (right). ddNTPs lack the hydroxyl groups which are needed for binding of further nucleotides.

Let's imagine that we stick the following template into the reaction:

ATCTGGATGCTGGATGGCCATATAGT

We add a mixture of normal dNTPs and some ddTTP to terminate the chain wherever there's a T nucleotide. We would end up with a mixture of the following fragments at the end:

AT
ATCT
ATCTGGAT
ATCTGGATGCT
ATCTGGATGCTGGAT
ATCTGGATGCTGGATGGCCAT
ATCTGGATGCTGGATGGCCATAT
ATCTGGATGCTGGATGGCCATATGT

Now we need to visualise what's in our sample... We know we can reliably separate different lengths of DNA using gel electrophoresis. Also, if we put a radioactive label on the ddNTP we can develop the gel and see where the fragments are in the gel. Run 4 different reactions and run them on adjacent gels, and you can read off the sequence (Figure 2).

Figure 2 - Gel electrophoresis of the products of 4 reactions with ddATP, ddTTP, ddCTP and ddGTP allows for the sequencing of the initial DNA fragment.

This original method was very time consuming both in calibrating the reaction and electrophoresis, and manually reading the sequence off the gels. Luckily, it lends itself nicely to automation. Replace the radioactive tag with a fluorescent one, and you can run 4 reactions in one and still differentiate between the 4 different nucleotides that were added. Teach a computer that green = A, red = T, yellow = G, blue = C and you can multiply the productivity by many orders of magnitude. 

This technology was eventually minituarised so it could run in a capillary tube and be fully automated in a machine. The machines pictured below can run 96 capillaries at the same time.

Flickr user jurvetson, DNA-Sequencers from Flickr 57080968, CC BY 2.0
Sanger sequencing remains relevant, and if anything is more accurate, but it can't provide the depth that NGS techniques can. While each machine pictured above can sequence 96 pieces of DNA at a total rate of about 6Mb (Megabase, 1,000 bases) per day (1), a modern Illumina machine can sequence millions of DNA fragments.

Why don't we use Sanger Sequencing anymore?

The error rate for Sanger sequencing tends to be pretty low, with one error for every 10,000 to 100,000 nucleotides sequenced (1). As with other sequencing platforms, the error rate increases with longer DNA fragments. NGS techniques have higher error rates than Sanger sequencing. So why has it been replaced? The quick answer is cost and throughput. Microbiome research, especially metagenomics and proteomics, involves sequencing all the DNA in a sample. That's a massive amount of data and would take much longer using chain termination sequencing machines. Sanger sequencing costs about $500/Mb of sequencing to produce 6Mb of data per day. If you use pyrosequencing you can get 750Mb for $20/Mb. Illumina sequencing will provide me with 5000Mb and only cost me $0.50/Mb (1). Sanger sequencing has its place, but that place is no longer in microbial community research.

However, if you're reading a paper about clone library analysis which has used chain termination sequencing, then you can be pretty sure that the results that the sequences that they've produced are fairly accurate. You'll still have to bear in mind the errors and biases introduced by lysis, extraction, PCR and clone library preparation, but sequencing is unlikely to have skewed the data very much.

Of course, the key to clone libraries is analysing the sequence data which is produced. Drawing phylogenetic trees and interpreting diversity indices, amongst other things. This will be the focus of the next post.

Reference
1. High-throughout DNA sequencing - Concepts and limitations. Kircher, M., Kelso, J. Bioessays. 2010; 32(6):524-536.

No comments:

Post a Comment