Next Generation Sequencing

Overview
Sanger sequencing (chain termination methodology), created during the mid-1970’s by Frederick Sanger and co-workers has been the molecular biology workhorse for many years. The technique, based predominantly on separation of extended fragments of DNA with the addition of di-deoxynucleotides. These nucleotides differ from standard deoxynucleotides as they lack a 3’-OH group. Thus, chain termination. The technique has become saturated as improvements over the years such as incorporation of fluorescently tagged di-deoxynucleotides, improved read-length up to 1000 bases, incorporation of capillary electrophoresis, increased automation and higher throughput have all allowed the technique to be used in genome projects and as a common molecular tool.

However, increased bases sequenced at a reduced cost have always been desired. For this reason new theories were developed in the late 90’s. These have come to fruition in the last 5 years or so with advances in chemical and physical technology. Thus, we have now entered the next generation era of sequencing.

Methods
Three main companies occupy the bulk of the next generation sequencing market.

These are

  • 454/Pyrosequencing (Roche)
  • SOLiD (Applied Biosystems)
  • Solexa (Illumina)

454
454 sequencing was one of the first companies to take advantage of the new technology. A graphical illustration of the process is shown below. Essentially, DNA is fragmented, joined to adapters at either end of the fragmented DNA, amplified in an emulsion PCR (includes 1 μm agarose bead with complimentary adaptors to fragmented DNA), PCR amplified allowing up to 1 million identical fragments around one bead and finally dropped into a PicoTitreTube (PTT). It is here where the reaction of fluorescence occurs with the addition of nucleotides. The intensity is read proportional to the number of homo-polymeric bases added.

SOLiD
Sequencing by Oligo/Ligation and Detection is a method by Applied Biosystems. It has a similar principle to pyrosequencing as the amplification of fragmented DNA on an agarose bead is repeated. From here-on, the procedure differs. The incorporation of a ligase and universal oligonucleotides allows all possible di-nucleotides to be accounted for. Via a process of addition of a random primer, nucleotide readings at regular intervals are possible. Removal and repeating with an n-1 primer allows further bases to be identified. This procedure is repeated a number of times to deduce the sequences of the fragmented DNA.

Illumina
Illumina/Solexa methodology works in a slightly different way. However, again the principles are to amplify a fragment of DNA to allow efficient reading. The process works on breaking up DNA, adding adaptors, but this time attach not to a bead but to a slide. Fold-back PCR is then used to amplify the fragmented DNA into a cluster. Sequential addition of nucleotides are added using a polymerase.

Applications

  • Genomics
  • Transcriptomics
  • Epigenomics
  • Interactomics

The applications of next-generation sequencing is not set to typical sequencing as we know it. The shear volume of data generated now means we can sequence genomes within days and so data storage and analysis becomes a key issue. De novo sequencing is now possible at a greater level. Genome sequencing is typically not completed and gaps are left open (or often completed with Sanger sequencing). The focus is on gene discovery (organism biology) or SNP (epidemiology or evolution). The shear volume of data allows greater comparative genomics to be performed e.g. methicillin-sensitive and methicillin-resistant strains of Staphylococcus aureus (Francois et al, 2007, Future Microbiology). Re-sequencing allows specific areas to be checked, especially as a cost effective way of discovering SNPs e.g. Bacillus subtilis re-sequencing identified new mutations and suppressor mutations (Srivatsan et al, 2008, PLOS Genetics). Sequencing can be used for typing e.g. SNP and small (1-2 bp) indels in Caenorhabditis elegans (Hillier et al, 2008, Nature Methods) and HIV clinical isolates identified rare members of the viral population (Hoffmann et al, 2007, Nucleic Acids Res). Genomics can also be used for metagenomics to identify the population of bacteria and their abundance within uncultured, unpurified and/or viral population e.g. microbial census of human intestine (Gill et al, 2006, Science) and RNA viral community in human faeces (Zhang et al, 2006, PLOS Biol).

In addition to classical genomics applications, next generation sequencing has been linked to transcriptomics, epigenomics and interactomics. Methods such as SAGE (Serial Analysis of Gene Expression) have been linked to next generation sequencing – 5’ end SAGE (Hashimoto et al, 2009, PLOS One). In addition RNA-Seq has been developed where RNA is converted back to cDNA, addition of adaptors and sequenced to deliver a digital output of expressed components and their abundance – RNA-Seq (Wang et al 09, Nature Reviews). Interactomics has been applied to next generation sequencing e.g. ChIP-Seq, STAT1 (Robertson et al 2007, Nature Methods) and Histone binding (Barski et al 2007, Cell). Next generation sequencing has also been linked to epigenomics with methylation e.g. Lister & Ecker ,2009 Gen Res.

The use of next-generation sequencing has been increasing exponentially in the last few years;
There are many future challenges ahead with next generation sequencing. Increased read length, less costs and less errors are always on the radar. The analysis techniques are always being improved with new algorithms developed all the time. In addition, we are now seeing newer sequencing theories being developed, so called, next-next generation sequencing.

References
454
SOLiD
Illumina