Wyatt Morgan

Sequencing

Wyatt Morgan · March 2026 · ~43 min read
Last edited March 29, 2026

A Detailed Technical Review of DNA Sequencing Platforms

Illumina (SBS) · Element Biosciences / AVITI (Avidity) · Ultima Genomics (Flow SBS) · Sanger (Chain Termination) · Oxford Nanopore (Nanopore) · PacBio (SMRT) · Roche (Sequencing by Expansion)

Library Architecture, Cluster/Polony/Bead Generation, Sequencing Chemistry, Error Profiles, and Practical Considerations

Illumina Sequencing-by-Synthesis (SBS)

Illumina's platform is the most widely deployed short-read sequencer in the world. It uses a cyclic reversible termination chemistry in which all four fluorescently labeled, 3’-blocked nucleotides compete for incorporation simultaneously. After imaging, the fluorophore and blocking group are cleaved, enabling the next cycle. Below, every architectural and chemical detail is laid out.

Illumina's core SBS chemistry traces back to work by Shankar Balasubramanian and David Klenerman at the University of Cambridge in the late 1990s. The two chemists conceived the idea of sequencing DNA on a surface using fluorescent reversible terminators while brainstorming over pints at the Panton Arms pub. They founded Solexa in 1998, which was acquired by Illumina in 2007 for $600 million. As of 2025, Illumina instruments have generated more than 85% of all sequencing data ever produced worldwide. The cost of sequencing a human genome has fallen from ~$2.7 billion (Sanger-based Human Genome Project, 2003) to under $200 on modern Illumina instruments — a reduction of over seven orders of magnitude in two decades, outpacing Moore's Law.

Library Structure (Adapter Architecture)

A completed Illumina library has a strict 5’→3’ linear architecture. The canonical form is:

5’---P5---i5---Read 1 primer site---[INSERT]---Read 2 primer site---i7---P7---3’

More precisely, reading from the P5 (left) end:

Illumina library architecture diagram showing the P5-i5-Read1-INSERT-Read2-i7-P7 structure
Figure 1. Canonical Illumina library architecture. The linear molecule contains flow cell binding sequences (P5/P7) at each end, sample indexes (i5/i7), sequencing primer binding sites flanking the target insert, and the insert DNA itself. This architecture is shared by all Illumina platforms; TruSeq and Nextera differ only in the sequencing primer binding regions.

TruSeq vs. Nextera Adapter Systems

These are the two dominant Illumina adapter families. They differ in their sequencing primer binding regions but share the same P5/P7 flow cell binding sequences.

Can R2 Go Next to P5/i5? (Orientation Constraints)

No — the orientation is fixed. P5 is always on the same end as i5 and the Read 1 primer site. P7 is always on the same end as i7 and the Read 2 primer site. This is because the physical flow cell lawn has two distinct oligo species (P5 and P7) grafted at fixed positions, and the sequencing workflow (Read 1 → Index 1 (i7) → Index 2 (i5) → Read 2) is hard-wired into the instrument's fluidics. Swapping R2 next to P5 would break cluster generation and all downstream read priming.

Can TruSeq and Nextera Be Combined in a Pool?

Yes, with caveats. Illumina's sequencing reagent cartridges actually contain a mixture of sequencing primers from multiple adapter families (TruSeq, Nextera, and even legacy kits). Therefore, libraries built with TruSeq adapters and libraries built with Nextera adapters can be pooled and sequenced together on the same flow cell lane. You can even mix-and-match within a single library molecule (e.g., TruSeq Read 1 site on one end and Nextera Read 2 site on the other), though this is not recommended for beginners because demultiplexing and trimming parameters differ between the two systems. The key constraint is that the P5 and P7 sequences must be full-length and intact on both ends.

Full-Length vs. Stubby-Y Adapters

There are two major physical adapter designs:

Unique Dual Indexing (UDI) vs. Combinatorial Dual Indexing (CDI)

In CDI, a small set of i5 and i7 indexes are used in all possible pairwise combinations. Any index hopping event can produce a valid (but incorrect) index pair, causing sample cross-contamination. In UDI, each sample receives a globally unique pair of i5+i7, so any hopped combination produces an invalid pair that can be computationally filtered. Illumina now strongly recommends UDI for all patterned flow cell instruments (NovaSeq, NextSeq 1000/2000, NovaSeq X) because exclusion amplification chemistry on patterned flow cells has a measurably higher rate of index hopping than bridge amplification on random flow cells.

Cluster Generation

Bridge Amplification (Random Flow Cells)

Used on MiSeq, HiSeq 2500 (rapid run mode), and older instruments. Single-stranded library molecules are loaded onto the flow cell and hybridize to the P5 or P7 oligo lawn. The free end of each molecule folds over and hybridizes to the complementary oligo nearby, forming a "bridge." Polymerase extends, creating a double-stranded bridge. Denaturation yields two surface-tethered single strands. This cycle repeats approximately 35 times, generating a cluster of roughly 1,000 identical copies of the original template in a random physical location. Clusters are roughly 1 µm in diameter.

Exclusion Amplification (ExAmp, Patterned Flow Cells)

Used on NovaSeq 6000, NovaSeq X, NextSeq 1000/2000. Patterned flow cells have pre-etched nanowells at defined positions. Library, polymerase, and recombinase are mixed and loaded simultaneously. The first molecule to seed a nanowell is amplified so rapidly that it excludes other molecules from occupying that well (kinetic exclusion). This produces monoclonal clusters at uniform spacing, dramatically increasing cluster density and data output. However, ExAmp is more susceptible to index hopping because free library molecules are in contact with surface-bound molecules and recombinase for an extended period during amplification.

Post-Amplification Linearization

After cluster generation, reverse strands are cleaved and washed away, leaving only forward strands for Read 1 sequencing. After Read 1 and Index 1 are complete, the forward strands are removed, reverse strands are resynthesized by bridge amplification and then linearized for Read 2. This resynthesis step is why Read 2 quality is always slightly lower than Read 1 — the resynthesis introduces additional stochastic error.

Sequencing Chemistry (Cyclic Reversible Termination)

The Incorporation Cycle

Each cycle consists of four steps:

4-Channel, 2-Channel, and 1-Channel Chemistry

Illumina has used three different optical encoding schemes:

XLEAP-SBS Chemistry

Introduced on the NextSeq 1000/2000 and NovaSeq X Plus, XLEAP-SBS uses new nucleotide analogues and polymerases that dramatically reduce signal decay (photobleaching) over the course of a run. Older chemistry showed ~50% signal intensity loss over 150 cycles; XLEAP maintains essentially flat signal. This enables longer reads (up to 2×300 on NextSeq 2000) with higher quality at the ends. Phasing remains the primary read-length limitation under XLEAP.

Error Profiles and Quality Decay

Phasing and Pre-Phasing

The dominant error mechanism in Illumina sequencing. Within each cluster, ~1,000 molecules should be in perfect synchrony. However:

Typical phasing/pre-phasing rates are 0.1–0.2% per cycle. This seems small, but it compounds: after 250 cycles, approximately 50% of molecules are out of phase if uncorrected. Illumina's Real-Time Analysis (RTA) software applies computational phasing correction using the known rates estimated from early cycles (or empirically per-cycle on newer instruments), rescuing much of the signal. But there is a hard limit beyond which correction fails, which is why quality drops toward read ends.

Signal Decay (Photobleaching)

Repeated laser excitation damages fluorophores on the growing strand or causes photodamage to the DNA itself. This manifests as a progressive drop in signal intensity across cycles, compounding the phasing problem. Pre-XLEAP chemistries saw roughly 50% signal loss by cycle 150. The combined effect of phasing + bleaching means the signal-to-noise ratio degrades exponentially, ultimately making base calls unreliable. New chemistry (XLEAP) addresses bleaching but phasing remains.

Practical Quality Characteristics

The Chastity Filter

Before reads enter analysis, each cluster is assessed for signal purity. The "chastity" score is the ratio of the brightest signal to the sum of the two brightest signals. A score of 1.0 means a perfectly pure, monoclonal cluster. Clusters scoring <0.6 in the first 25 cycles are filtered out as "non-passing filter" (non-PF). Typical good runs achieve >80% PF.

Sequencing Read Order

The instrument sequences in a fixed order:

This fixed order means you must always collect at least some Read 1 data (it sets spatial coordinates and phasing parameters), even if your biological interest is entirely in Read 2.

Element Biosciences AVITI (Avidity Sequencing)

Element Biosciences launched the AVITI in 2022, introducing a fundamentally different chemistry called "Avidity Sequencing" or "Avidite Base Chemistry" (ABC). The key innovation is the separation of nucleotide identification from nucleotide incorporation, using multivalent molecular complexes called "avidites." The instrument uses three different engineered polymerases, rolling circle amplification instead of bridge PCR, and a low-binding surface chemistry.

Element was founded in 2017 by Molly He, who previously led engineering teams at Illumina and was a co-inventor on multiple Illumina sequencing patents. The company raised over $400 million before launching its first instrument and explicitly designed the AVITI to be Illumina-library-compatible from day one — a strategic choice that dramatically lowered the switching cost for labs already invested in Illumina workflows. The "avidity" approach — using multivalent binding to amplify signal without modifying the DNA — was inspired by the immune system, where antibodies achieve high-avidity target recognition through multiple simultaneous weak interactions.

Library Structure and Compatibility

Circular Library Requirement

Unlike Illumina, AVITI requires circular library molecules as templates for rolling circle amplification. There are three routes to get there:

Adapter Compatibility Details

The AVITI is compatible with standard Illumina TruSeq and Nextera adapter sequences. The sequencing primers used are essentially the same Illumina standard sequences, meaning the vast majority of existing Illumina libraries can run on the AVITI without modification beyond circularization. Key practical notes:

Polony Generation (Rolling Circle Amplification)

This is where AVITI diverges most dramatically from Illumina.

The RCA Process

The flow cell surface is coated with a low-binding chemistry studded with capture oligos complementary to the adapter sequences. When a circular library molecule hybridizes to a capture oligo, rolling circle amplification begins:

Each polony contains many copies of the same sequence, all in close physical proximity, analogous to an Illumina cluster but generated without PCR.

Advantages Over Bridge Amplification

Throughput

A high-output AVITI flow cell contains approximately 1 billion polonies, each generating one read pair. The AVITI runs two independent flow cells simultaneously, yielding ~2 billion read pairs per run. Read lengths of 2×75 through 2×300 are supported with different kit configurations.

Sequencing Chemistry (Sequencing by Binding)

This is the most novel aspect of the platform. Unlike Illumina, where base identification and base incorporation happen in the same chemical step (a labeled nucleotide is incorporated), AVITI splits these into two distinct phases per cycle.

Phase 1: Detection (Avidite Binding)

After washing away any reagents from the previous cycle, the flow cell is flooded with a mixture of:

Each avidite is a multivalent molecular complex with the following structure:

The ABP sits at the primer-template junction on each copy in the polony and attempts to recruit a complementary nucleotide. Since each copy in the polony is at the same position (they are synchronized), they all recruit the same avidite type. Because a single avidite molecule has multiple nucleotide arms, it simultaneously engages multiple ABP sites across the polony. This multivalent interaction creates an extremely stable complex through avidity (many weak interactions summing to a strong one), even though each individual nucleotide:polymerase interaction is transient.

The result: bright, stable fluorescent signal at nanomolar avidite concentrations (100-fold lower than the micromolar concentrations needed for labeled nucleotides in Illumina SBS). The fluorophore is also physically distant from the DNA, reducing photodamage.

Phase 2: Incorporation (Strand Extension)

After imaging, the ABPs and avidites are stripped away. The flow cell is then flooded with:

This polymerase incorporates a single, unmodified nucleotide at each position, then the 3’ block is removed. Because incorporation uses unlabeled nucleotides, no fluorescent scars are left on the DNA. The growing strand is chemically identical to natural DNA.

Why This Matters

Error Profiles and Quality

Sequencing Order

With Cloudbreak chemistry, the AVITI sequences indexes (i7 and i5) before Read 1 and Read 2. This provides real-time QC and demultiplexing feedback before the long insert reads even begin — letting you catch loading or library problems early. Read order: Index 1 → Index 2 → Read 1 → Read 2.

Output and Run Times

Parameter AVITI (High Output) AVITI (Low Output)
Reads per flow cell ~1 billion ~100 million
Reads per run (2 FC) ~2 billion ~200 million
Read lengths 2×75 to 2×300 2×75 to 2×300
2×150 run time <40 hours <40 hours
Quality ≥90% Q40 ≥90% Q40

Ultima Genomics (Flow-Based SBS)

Ultima Genomics launched the UG 100 in February 2024, representing the most radical hardware departure from the standard flow cell paradigm. It uses a spinning silicon wafer, emulsion PCR for clonal amplification, and a non-terminating single-nucleotide flow chemistry. It is designed for ultra-high throughput at extreme cost efficiency (targeting the $100 genome).

Library Structure

Native Ultima Libraries

Ultima's library structure differs from Illumina's:

Ultima provides two library preparation workflows:

Illumina Library Conversion

Illumina libraries can be converted to Ultima format via PCR. Primers anneal to the existing Read 1/Read 2 regions and add Ultima-specific PS-SBC and UBA overhangs. The P5/P7 sequences are effectively replaced. This conversion has been demonstrated for 10x Genomics single-cell libraries, Olink proteomics libraries, and standard WGS/RNA-seq preps.

Single-End Reads (Critical Difference)

Ultima sequencing is inherently single-end. Each bead (and therefore each read) sequences from one end of the library insert only. There is no equivalent of Illumina's paired-end resynthesis. Read lengths follow a distribution (not fixed), with a median of ≥300 bases and post-filtering median of ~250 bases. For applications requiring paired-end information, the single-end reads are computationally split into simulated paired-end format.

Clonal Amplification (Emulsion PCR on Beads)

Ultima uses off-instrument, automated emulsion PCR rather than on-surface amplification:

For ppmSeq (paired plus-minus sequencing), both strands of each original DNA duplex are captured on the same bead. Denaturation occurs within the emulsion droplet after bead ligation, so forward and reverse strand templates are co-amplified on a single bead. This enables downstream computational duplex error correction.

Bead Loading onto the Wafer

The amplified beads are loaded onto a 200mm silicon wafer (the same diameter as standard semiconductor wafers). The wafer surface is patterned at micron scale with an array of electrostatic landing pads. Beads settle onto these pads, ideally one bead per pad. A high-output wafer holds approximately 10–12 billion beads/reads.

The Spinning Wafer Architecture

This is Ultima's most distinctive hardware feature. Instead of a sealed flow cell with microfluidics and a scanning camera:

Sequencing Chemistry (Single-Nucleotide Flow, Non-Terminating SBS)

Ultima's chemistry is conceptually related to the extinct 454/Ion Torrent pyrosequencing approach but with critical innovations.

Flow Order

Nucleotides are introduced one species at a time in a repeating order (e.g., T, G, C, A, T, G, C, A...). Each introduction of one nucleotide species constitutes one "flow." Four flows (one of each base) constitute one "flow cycle." A run of ~300 base median length requires ~444 flows (~111 flow cycles).

Mostly Natural Nucleotides (mnSBS)

The key innovation: each flow delivers a mixture of mostly unmodified, natural nucleotides plus a minority (<20%) of fluorescently-labeled nucleotides. The polymerase remains processively bound to the template and incorporates bases without a terminator — meaning it can incorporate multiple nucleotides per flow if the template has a homopolymer. The labeled fraction provides optical signal; the unlabeled majority keeps polymerase kinetics and fidelity close to natural.

After each flow, the wafer is imaged at steady state. A fluorophore cleavage step removes the labels, leaving natural DNA. Because the dyes are cleaved, no scars accumulate.

Base Calling in Flow Space

Base calling on Ultima operates in "flow space" rather than "sequence space." For each flow, the system must determine how many nucleotides of that species were incorporated (0, 1, 2, 3, 4...). The identity of the base is never in question (you know which nucleotide was flowed), so substitution errors are inherently very rare. The challenge is accurately determining the number of incorporations, especially for homopolymers. Ultima uses a deep convolutional neural network (CNN) trained on large, diverse datasets to convert raw signal intensities into base calls.

Homopolymer Challenge

Because the chemistry is non-terminating, a homopolymer run (e.g., AAAAAAA) results in all 7 A’s being incorporated in a single flow. The system must count the number of incorporations from the signal intensity. For short homopolymers (≤8–10 bp), accuracy is high. For longer homopolymers (>12 bp), accuracy degrades, and these regions are excluded from Ultima's high-confidence region (HCR). This is the classic limitation of flow-based chemistries, shared historically with 454 and Ion Torrent, though Ultima's ML-based calling substantially outperforms those predecessors.

Error Profile

ppmSeq (Paired Plus-Minus Sequencing)

Ultima's unique accuracy feature. By capturing both strands of a DNA duplex on a single bead during emulsion PCR, then sequencing both, the system can computationally compare forward and reverse strand reads. Any base call disagreement between the two strands likely represents a sequencing artifact or DNA damage and can be filtered. This achieves a raw read accuracy of Q60 (one error per million bases) for SNVs, enabling applications like liquid biopsy and minimal residual disease detection.

Output and Run Times

Parameter UG 100
Reads per wafer 10–12 billion
Data per wafer ≥2.5–3.0 terabases
Run time (≥300 bp) ~20 hours per wafer
Read type Single-end (variable length)
Read length median ≥300 bp raw; ~250 bp post-filter
Genomes/year (30X) ~20,000
Cost per genome (30X) ~$100

Sanger Sequencing (Chain Termination)

Sanger sequencing, developed by Frederick Sanger in 1977, is the original DNA sequencing method and remains the gold standard for validation experiments, small-scale sequencing, and clinical confirmatory testing. It uses dideoxynucleotide chain termination to produce a ladder of fragments whose lengths encode the template sequence. Despite being largely supplanted by massively parallel platforms for discovery work, it persists in essentially every molecular biology laboratory in the world.

Frederick Sanger is one of only four people to have won two Nobel Prizes — and the only person to have won the Chemistry Nobel twice. His first (1958) was for determining the amino acid sequence of insulin, proving that proteins have defined primary structures. His second (1980) was for the dideoxy chain-termination method described here, shared with Walter Gilbert and Paul Berg. Sanger's method went on to power the Human Genome Project, which took 13 years and roughly $2.7 billion to produce the first human reference genome. Sanger famously described himself as "just a chap who messed about in his lab," and upon retirement he declined a knighthood, reportedly saying he did not wish to be called "Sir."

Library Structure (Template Preparation)

Sanger sequencing does not use "libraries" in the NGS sense. Instead, it sequences a single template molecule (or PCR amplicon) per reaction. The input is one of:

There is no adapter ligation, no indexing, and no clonal amplification step. Each sequencing reaction interrogates one template with one primer, producing a single read.

The Chain Termination Reaction

Each Sanger reaction contains:

The dNTP:ddNTP ratio (typically ~100:1 to 300:1) is calibrated so that, on average, every possible position in the template has a statistical population of fragments terminating there. During thermal cycling (cycle sequencing), the primer extends along the template, incorporating dNTPs normally until a ddNTP is stochastically incorporated instead, at which point that copy terminates. After 25–30 thermal cycles, the reaction contains a nested set of fragments ranging from the primer to every position in the readable region, each terminated by a fluorescently labeled ddNTP whose color encodes the terminal base.

Capillary Electrophoresis and Detection

Modern Sanger sequencing uses capillary electrophoresis (CE), not slab gels. The standard instruments are:

The terminated fragments are electrokinetically injected into fused-silica capillaries filled with POP-7 (Performance Optimized Polymer), a linear polyacrylamide sieving matrix. Fragments separate by size as they migrate through the polymer under an applied electric field (typically 15 kV). As each fragment passes the detection window, an argon-ion laser (or LED on newer instruments) excites the terminal ddNTP fluorophore. A CCD camera or photodiode array records the emission through a set of spectral filters, resolving the four dye colors. The resulting raw data is a four-color electropherogram (chromatogram) where each peak corresponds to one base position.

Base Calling and Quality Scores

The standard base-calling software is KB Basecaller (Applied Biosystems), which assigns Phred quality scores to each called base. Phred scores were originally developed specifically for Sanger sequencing trace files by Brice Ewing and Phil Green (1998). A Phred score of Q20 means 1% error probability; Q30 means 0.1%; Q40 means 0.01%. A typical good Sanger read produces 700–900 bases of Q20+ sequence, with the first ~30–50 bases being unreliable (primer peak artifacts and unresolved short fragments) and quality degrading beyond ~800–900 bases due to decreasing fragment resolution.

Stylized Sanger sequencing electropherogram showing four-color traces with quality variation across the read
Figure 2. Stylized Sanger sequencing electropherogram (chromatogram). Each peak represents one base position, with the peak color encoding the terminal ddNTP (A, T, C, or G). The left end shows noisy, overlapping peaks typical of the primer region. The central window shows clean, well-resolved peaks where accuracy is highest (Q20–Q40). The right end shows broadening peaks as fragment resolution degrades, limiting practical read length to ~700–1,000 bases.

Error Profiles

Read Length and Throughput

Parameter Typical Value
Read length 700–1,000 bases (Q20+)
Maximum read length ~1,200 bases under optimal conditions
Reads per run (3730xl) 96 (one per capillary)
Run time ~2–3 hours per plate
Daily throughput (3730xl) ~1,500 reads (~1.2 Mb/day)
Cost per read ~$3–$8 (reagents + instrument time)

Practical Considerations

Oxford Nanopore Technologies (Nanopore Sequencing)

Oxford Nanopore Technologies (ONT) uses protein nanopores embedded in synthetic membranes to sequence individual DNA or RNA molecules in real time. An ionic current flows through the pore, and as a nucleic acid strand translocates through the narrowest constriction, the current is modulated by the identity of the bases occupying the sensing region. This is the only major sequencing platform that reads native DNA (or RNA) directly — no amplification, no synthesis, no labeling. The technology is deployed across instruments ranging from the USB-powered MinION to the production-scale PromethION.

In August 2016, NASA astronaut Kate Rubins used a MinION aboard the International Space Station to perform the first-ever DNA sequencing in microgravity — sequencing samples of bacteriophage lambda, E. coli, and mouse mitochondrial DNA. The harmonica-sized device required no more than a laptop and a USB port. Nine sequencing runs were conducted aboard the ISS over a six-month period, yielding 276,882 reads with performance comparable to ground-based controls (Castro-Wallace et al. 2017, Scientific Reports). Subsequent ISS experiments extended this to direct RNA sequencing in orbit. No other sequencing platform has been demonstrated outside of a terrestrial laboratory, and the MinION remains the only sequencer that can be carried in a coat pocket.

Library Structure

ONT libraries are remarkably simple compared to other platforms. The essential component is a motor protein (a helicase or translocase) ligated to the end of the DNA molecule, which controls the translocation speed through the pore. The two major library preparation approaches are:

No Indexing Constraints

ONT supports barcoding (multiplexing) via native barcodes (24-plex) or PCR barcodes (96-plex). Barcodes are short adapter-adjacent sequences that are basecalled and demultiplexed computationally. Unlike Illumina, there are no index-hopping concerns because there is no amplification on the flow cell.

The Nanopore and Translocation Mechanism

Cross-section of the Oxford Nanopore R10.4.1 pore showing dual constrictions, motor protein, and ionic current trace
Figure 3. Oxford Nanopore sequencing mechanism. A single DNA strand is ratcheted through the CsgG nanopore by a motor protein (helicase) at ~400 bases/second. The R10.4.1 pore has two constrictions that each read the passing k-mer, providing a dual measurement that improves accuracy. The resulting ionic current trace (right) encodes the base sequence as a series of current level changes.

The biological nanopore is a modified CsgG protein (from E. coli curli secretion system), designated R10.4.1 in the current chemistry. The pore is inserted into a synthetic lipid membrane stretched across a microwell on a CMOS sensor array. Key architectural features:

Base Calling

Raw ionic current signals are converted to base sequences by deep neural networks. The current production basecaller is Dorado, which uses a transformer-based architecture. Three speed/accuracy models are available:

Native Base Modification Detection

Because the nanopore reads unmodified DNA directly, any chemical modification to a base (5-methylcytosine, 6-methyladenine, 5-hydroxymethylcytosine, etc.) produces a characteristic current perturbation. Dorado includes modification-aware models that call base modifications simultaneously with primary sequence — no bisulfite conversion, no antibody enrichment, no enzymatic treatment required. This is unique among all sequencing platforms and is the primary reason many epigenetics laboratories have adopted ONT.

Read Length

ONT has no inherent upper limit on read length — it is determined entirely by the input DNA fragment size. Ultra-long read protocols using gentle DNA extraction (e.g., agarose plug-based methods, Circulomics/Short Read Eliminator) routinely produce reads >100 kb, with individual reads exceeding 4 Mb reported. The current Guinness World Record for longest nanopore read is >4.2 Mb. Typical read length distributions depend on the library prep method:

Instruments and Throughput

Instrument Flow Cell Pores Output per Flow Cell Run Time
Flongle 126 ~2.8 Gb Up to 24 hours
MinION / Mk1C 2,048 ~50 Gb Up to 72 hours
P2 Solo 2,048 ~50 Gb Up to 72 hours
PromethION (P24/P48) 2,675 per FC × 24 or 48 FC ~290 Gb per FC; up to 14 Tb per run Up to 72 hours

Error Profiles

Real-Time and Adaptive Sampling

A unique ONT capability: the instrument can eject a DNA molecule from the pore mid-read if the initial sequence does not match a target of interest ("Read Until" / adaptive sampling). The pore is then free to capture the next molecule. This enables real-time target enrichment without prior capture or amplification — for example, sequencing only a set of clinically relevant genes from a whole-genome library, achieving up to 5–10x enrichment of on-target reads. Adaptive sampling can also be used in reverse, depleting unwanted sequences (e.g., host DNA in a metagenomic sample).

Pacific Biosciences (PacBio) SMRT Sequencing

PacBio's Single Molecule, Real-Time (SMRT) sequencing observes a single polymerase molecule incorporating fluorescently labeled nucleotides into a growing complementary strand in real time. The polymerase is immobilized at the bottom of a zero-mode waveguide (ZMW), a nanophotonic structure that confines the observation volume to zeptoliters, enabling single-molecule fluorescence detection against a background of freely diffusing labeled nucleotides. The current production instruments are the Revio (high-throughput HiFi) and the Vega (benchtop, long-read focused).

The zero-mode waveguide concept was invented by Jonas Korlach, Stephen Turner, and colleagues at Cornell University, drawing on nanophotonics principles first described by Harold Craighead's lab. The key insight was that a metal aperture smaller than the wavelength of light creates an evanescent field rather than a propagating wave — confining illumination to a volume so small (~20 zeptoliters) that single fluorescent molecules become detectable against a micromolar background. PacBio was founded in 2004 and delivered its first commercial instrument (the RS) in 2011. The 2022 introduction of HiFi sequencing — achieved by computationally combining multiple noisy passes around a circular template into one highly accurate consensus read — transformed PacBio from a niche long-read platform into a serious contender for population-scale genome sequencing (Wenger et al. 2019, Nature Biotechnology). The T2T Consortium's 2022 completion of the first truly complete, telomere-to-telomere human genome assembly (T2T-CHM13) relied heavily on PacBio HiFi reads for base-level accuracy and Oxford Nanopore ultra-long reads for spanning centromeric repeats (Nurk et al. 2022, Science).

Library Structure (SMRTbell)

PacBio libraries are circular molecules called SMRTbells. The architecture is:

This circular topology is critical because the sequencing polymerase can traverse the entire SMRTbell multiple times (rolling-circle fashion around the dumbbell), generating multiple passes over the same insert. Each complete traversal of both strands constitutes one "pass." The multiple passes enable intramolecular error correction to produce high-fidelity (HiFi) consensus reads.

SMRTbell construction:

Zero-Mode Waveguides (ZMWs)

PacBio SMRTbell library structure and zero-mode waveguide cross-section
Figure 4. PacBio library and detection architecture. Left: The SMRTbell library is a circular dumbbell-shaped molecule — the target dsDNA insert is capped at each end by single-stranded hairpin adapters, forming a topologically closed circle. The polymerase traverses this circle multiple times, enabling intramolecular consensus (HiFi). Right: Cross-section of a zero-mode waveguide (ZMW). The polymerase is immobilized at the bottom of a ~70 nm diameter well in an aluminum film. The well diameter is below the wavelength of light, so only an evanescent field illuminates the bottom ~30 nm, creating a zeptoliter detection volume that makes single-molecule fluorescence visible.

The SMRT Cell is a silicon chip containing millions of ZMWs — cylindrical holes approximately 70–100 nm in diameter and ~100 nm deep, fabricated in an aluminum film on a glass substrate. The diameter is smaller than the wavelength of excitation light (~532 nm), so light cannot propagate through the hole. Instead, an evanescent field decays exponentially from the bottom of the well, illuminating only the bottom ~30 nm — a detection volume of ~20 zeptoliters (20 × 10&supmin;²¹ L).

The sequencing polymerase is chemically tethered to the bottom of each ZMW. Fluorescently labeled nucleotides diffuse freely in solution above, but only become visible when they enter the ZMW observation volume and are bound by the polymerase (residence time ~10–100 ms). The bulk solution concentration (~µM) ensures labeled nucleotides are diffusing in and out of the ZMW rapidly, but only the one being incorporated is immobilized long enough to generate a pulse of fluorescence.

The Revio SMRT Cell 25M contains approximately 25 million ZMWs, of which 8–12 million typically yield productive sequencing reads (the remainder are empty or contain multiple polymerases).

Sequencing Chemistry

PacBio uses phospholinked nucleotides: the fluorescent dye is attached to the terminal phosphate of the nucleotide triphosphate, not to the base. During incorporation:

This "label-then-cleave" design means no chemical scars accumulate on the synthesized strand, and the polymerase processes a natural DNA template, preserving its ability to detect kinetic signatures of base modifications.

SPRQ Chemistry

SPRQ (Sequencing Plate-Ready Q-chemistry) is PacBio's latest HiFi chemistry for Revio. Key improvements over previous chemistries include a longer-lived polymerase (enabling more passes per SMRTbell), improved reagent stability, reduced input requirement (500 ng of native DNA), and ~33% higher HiFi yield per SMRT Cell. SPRQ enables two 30x human genomes per SMRT Cell at ≥Q30 accuracy.

HiFi (Circular Consensus Sequencing) vs. CLR

PacBio operates in two primary modes:

Kinetic Base Modification Detection

The sequencing polymerase pauses or slows at modified bases (e.g., m6A, m4C) because the modified template base alters the enzyme kinetics. By analyzing the interpulse duration (IPD) — the time between successive fluorescent pulses — PacBio can detect base modifications directly from the sequencing data, without any chemical treatment. HiFi mode enables detection of m6A and CpG methylation (5mC) at single-molecule resolution with high confidence. This requires the kinetic information from multiple passes to distinguish modification-induced slowdowns from stochastic variation.

Error Profiles

Output and Run Times

Parameter Revio (SPRQ) Vega
SMRT Cells per run Up to 4 (simultaneously) 1
ZMWs per SMRT Cell 25 million 8 million
HiFi output per SMRT Cell ~100–120 Gb ~25 Gb
HiFi output per day (Revio) ~480 Gb
HiFi read length 10–25 kb (mean ~15 kb) 10–25 kb
HiFi accuracy ≥Q30 (≥99.9%) ≥Q30
Run time ~24 hours per SMRT Cell ~24 hours
30x genomes/year (Revio) ~2,500

Roche Sequencing by Expansion (SBX)

Roche's Sequencing by Expansion (SBX), publicly unveiled in February 2025, represents an entirely new category of sequencing technology. Rather than reading DNA directly through a nanopore or detecting fluorescent nucleotides during synthesis, SBX first converts the DNA sequence into an expanded surrogate polymer called an Xpandomer, then reads that Xpandomer through a nanopore. The expansion step amplifies the physical spacing between base-encoded reporter elements by approximately 50-fold, overcoming the fundamental spatial resolution limitations of direct nanopore sequencing. The technology is currently in late-stage development and not yet commercially available.

In October 2025, Roche, Broad Clinical Labs, and Boston Children's Hospital announced a Guinness World Record for the fastest DNA sequencing technique: a complete human genome was sequenced and analyzed (blood sample to annotated VCF) in under 4 hours using SBX, beating the previous record of 5 hours and 2 minutes. The team subsequently demonstrated a same-day workflow from neonatal ICU blood draw to actionable clinical report in under 8 hours — fast enough to keep pace with a high-volume NICU. The work was described in the New England Journal of Medicine. At the same ASHG conference, Roche demonstrated 15 billion reads generated in a single hour of sequencing, underscoring SBX's raw throughput ambitions.

Library Structure

SBX supports two distinct library preparation modes:

The Expansion Chemistry

Roche SBX sequencing workflow: template DNA to X-NTP incorporation to acid expansion to nanopore readout
Figure 5. Roche SBX sequencing-by-expansion workflow. Template DNA is copied by the engineered XP Synthase polymerase using massive (~20 kDa) X-NTP monomers, producing a condensed Xpandomer. Acid treatment cleaves scissile bonds, expanding the polymer ~50-fold and physically separating the reporter codes. The expanded Xpandomer is then threaded through a biological nanopore on an 8-million-well CMOS sensor array, producing four discrete, easily resolved current states.

This is the defining innovation of SBX. The process converts a standard DNA molecule into an Xpandomer through the following steps:

The expansion chemistry takes approximately 2 hours on a benchtop unit with simple fluidics before the expanded molecules are loaded onto the sequencing instrument.

Nanopore Readout (Genia CMOS Array)

The Xpandomer is threaded through biological nanopores embedded in a CMOS sensor array — technology derived from Roche's 2014 acquisition of Genia Technologies. Key specifications:

Base Calling and Accuracy

Errors in SBX arise approximately equally from two sources: Xpandomer synthesis errors (XP Synthase misincorporation or slippage, base error rate ~0.7%) and data collection errors (nanopore signal misclassification). Quality scores are binned into three levels (high, medium, low quality).

Throughput and Speed

SBX is designed for extreme throughput and speed. A demonstration run produced seven human genomes at 30x coverage in 1 hour of sequencing time. Total sample-to-VCF turnaround has been demonstrated at 6 hours 25 minutes using simplex chemistry. The data throughput rate is approximately 500 megabases per second per sensor module. The system supports flexible "run until done" sequencing — runs terminate when a target data accumulation threshold is reached, rather than running for a fixed duration.

Parameter SBX (Demonstrated)
Read modes Simplex (<200–1,500 bp) and Duplex (200–350 bp)
Simplex accuracy Q20+ (≥99%)
Duplex accuracy Q30+ (high Q30s)
Sensor array ~8 million nanopore microwells (CMOS)
Throughput demonstration 7 × 30x genomes in 1 hour sequencing
Sample-to-VCF (simplex) ~6.5 hours
Data rate ~500 Mb/s per sensor module

Instrument Architecture

SBX uses a two-instrument workflow: a benchtop expansion unit (handles the 2-hour chemical conversion of DNA to Xpandomers) and a floor-standing sequencer equipped with a large GPU-based compute server for real-time basecalling. The expansion unit has simple fluidics and is designed to be user-friendly. The sequencer accepts the reusable CMOS sensor modules and handles reagent delivery and data acquisition.

Current Limitations

Platform Comparison Summary

Scatter plot comparing read length versus per-read accuracy across all seven sequencing platforms
Figure 6. Read length versus per-read accuracy across all seven platforms (log-scale x-axis). Short-read platforms (Illumina, Element, Ultima, Roche SBX) cluster at the left with read lengths under ~1.5 kb. Long-read platforms (ONT, PacBio) span 10 kb to megabases. PacBio HiFi occupies a unique position: long reads (10–25 kb) at short-read-equivalent accuracy (≥Q30). ONT duplex and Roche SBX duplex modes achieve higher accuracy than their simplex counterparts at the cost of throughput or read length. Sanger sits alone in the middle — longer than short-read NGS but far shorter than long-read platforms.
Feature Illumina Element AVITI Ultima UG 100 Sanger Oxford Nanopore PacBio (HiFi) Roche SBX
Chemistry Cyclic reversible termination (SBS) Sequencing by binding (avidity) + separate incorporation Non-terminating flow SBS (mnSBS) Dideoxy chain termination + capillary electrophoresis Ionic current through protein nanopore Real-time fluorescent nucleotide incorporation in ZMWs Xpandomer expansion + nanopore readout (CMOS)
Amplification Bridge amp (random FC) or ExAmp (patterned FC) Rolling circle amplification (polonies) Emulsion PCR on beads Cycle sequencing (linear amplification) None (single molecule) None (single molecule) None (Xpandomer synthesis from single molecule)
Surface Glass flow cell Low-binding coated flow cell 200mm silicon wafer Fused-silica capillary (POP-7 polymer) Lipid membrane over CMOS array SMRT Cell (ZMW nanowell chip) Lipid membrane over 8M-well CMOS array
Read type Paired-end (fixed length) Paired-end (fixed length) Single-end (variable length) Single-end (fixed primer) Single-end (native strand) Circular consensus (HiFi) or CLR Simplex or duplex
Max read length 2×300 (MiSeq/NextSeq) 2×300 Median ≥300 (single end) ~1,000–1,200 bp No upper limit (>4 Mb demonstrated) 10–25 kb (HiFi); >100 kb (CLR) ~1,500 bp (simplex); ~350 bp insert (duplex)
Typical quality >85% ≥Q30 >90% ≥Q40 >85% ≥Q30 Q20–Q40 (700–900 bp window) Q20–Q25 simplex; Q30+ duplex ≥Q30 (HiFi); Q10–Q15 (CLR) Q20+ simplex; high Q30s duplex
Index hopping Yes (esp. ExAmp) None (RCA) None (emPCR) N/A (no multiplexing) None (single molecule) None (single molecule) None (single molecule)
Dominant error Substitutions (phasing) Very low; all types rare Indels in homopolymers Miscalls at compressed peaks Indels in homopolymers Balanced sub/ins/del (HiFi); insertions (CLR) ~Equal synthesis + readout errors; homopolymer slippage
Homopolymer perf. Good (1 base/cycle) Excellent (no error spike) Good to ≤8–10 bp; degrades beyond Degrades >8–10 bp (peak merging) Improved (R10.4.1); still challenging >8–10 bp Good (HiFi consensus); poor (CLR) >99% F1 <15 bp (duplex); degrades beyond
Library compat. Native (TruSeq, Nextera) Illumina-compatible + native Elevate Requires UG adapters (conversion available) Any template + primer pair Native ligation, rapid, or PCR kits SMRTbell (hairpin adapter ligation) SBX-specific library prep
Epigenetic detection Requires bisulfite/EM-seq conversion Requires bisulfite/EM-seq conversion Requires bisulfite/EM-seq conversion N/A Native (direct modification calling) Native (kinetic IPD analysis) Not supported (reads Xpandomer, not native DNA)
Throughput/run Up to 16 Tb (NovaSeq X) ~600 Gb (2 FC) ≥2.5–3.0 Tb per wafer ~1.2 Mb/day (3730xl) Up to 14 Tb (PromethION P48) ~480 Gb/day (Revio, 4 SMRT Cells) ~7 × 30x genomes/hour (demonstrated)
Cost per Gb $2–$6 $5–$7 ~$1 ~$500–$2,000 (not meaningful at scale) $3–$20 (instrument-dependent) $8–$15 TBD (targeting <$2 at scale)

Practical Decision Guide

When to Choose Each Platform

Key Technical Caveats

Document generated March 2026. Technical details sourced from manufacturer documentation, peer-reviewed publications, and core facility protocols. Key references: Sanger et al. 1977 PNAS (chain termination); Ewing & Green 1998 Genome Res (Phred scores); Eid et al. 2009 Science (ZMW single-molecule sequencing); Wenger et al. 2019 Nat Biotechnol (HiFi CCS); Arslan et al. 2023 Nat Biotechnol (Element avidity sequencing); Almogy et al. 2022 (Ultima flow SBS); Jain et al. 2018 Nat Biotechnol (nanopore R9.4); Castro-Wallace et al. 2017 Sci Rep (ISS nanopore sequencing); Nurk et al. 2022 Science (T2T-CHM13 assembly); Kokoris et al. 2025 bioRxiv (Roche SBX); Broad Clinical Labs / Roche 2025 NEJM (SBX Guinness record).

← All writing