Sequencing - Work in Progress
A Detailed Technical Review of DNA Sequencing Platforms
Illumina (SBS) · Element Biosciences / AVITI (Avidity) · Ultima Genomics (Flow SBS) · MGI / DNBSEQ (DNA Nanoballs) · Sanger (Chain Termination) · Oxford Nanopore (Nanopore) · PacBio (SMRT) · Ion Torrent (Semiconductor) · Roche (Sequencing by Expansion)
Library Architecture, Cluster/Polony/Bead Generation, Sequencing Chemistry, Error Profiles, and Practical Considerations
The Sequencing Landscape
DNA sequencing has undergone a transformation few technologies can match. Sanger's chain termination method in 19771 required thirteen years and roughly $2.7 billion to produce the first human genome; today's massively parallel platforms sequence human genomes in hours for under $2003. The technological distance spanned represents seven orders of magnitude cost reduction and ten-billionfold throughput increase. The platforms covered here branch into three distinct chemical families: cyclic reversible termination (Illumina, Element AVITI, Ultima, MGI), long-read nanopore and nanophotonic technologies (Oxford Nanopore, PacBio, Roche SBX), and the historical capillary-based chain termination method (Sanger). Each embodies fundamentally different design choices—how to amplify signal without amplifying error, how to encode identity, how to trade speed against accuracy—and these choices determine which applications each platform serves best.
The ecosystem reflects more than technological diversity; it reflects economic pressure. Illumina instruments have generated more than 85% of all sequencing data ever produced3, a dominance enabled by combining manufacturing excellence with chemical elegance. Yet this dominance itself has driven innovation in competing labs: Element Biosciences, founded by ex-Illumina engineers, designed their entire platform around Illumina library compatibility to lower switching friction. Ultima Genomics bypassed traditional flow cells entirely, adopting spinning silicon wafers borrowed from semiconductor manufacturing. Oxford Nanopore eliminated amplification entirely, reading native DNA. Each represents a different bet on what matters most—cost, accuracy, speed, or long reads. This review breaks down the structural and chemical mechanisms, error profiles, and practical trade-offs that distinguish them.
Illumina Sequencing-by-Synthesis (SBS)
Illumina sequencing is the most widely deployed short-read platform globally. It uses cyclic reversible termination: all four fluorescently labeled, 3'-blocked nucleotides compete simultaneously for incorporation; after imaging, the fluorophore and blocking group cleave, enabling the next cycle. The chemistry is elegant and the manufacturing is brutal—Illumina's cost curve has followed Moore's Law, not biological laws.
The underlying chemistry traces to work by Shankar Balasubramanian and David Klenerman at the University of Cambridge in the late 1990s2. The two chemists founded Solexa in 1998 on the concept of sequencing DNA on a surface using fluorescent reversible terminators. Illumina acquired Solexa in 2007 for $600 million. By 2025, Illumina instruments had generated more than 85% of all sequencing data ever produced worldwide—a proportion that reflects not just the platform's technical merits but its cost trajectory and the massive installed base in clinical and research laboratories. The human genome cost has fallen from ~$2.7 billion (Sanger-based Human Genome Project, 2003) to under $200 on modern instruments, a reduction matching Balasubramanian's original prediction: domination through chemical simplicity and manufacturing volume.
Library Structure (Adapter Architecture)
A completed Illumina library has a strict 5'→3' linear architecture. The canonical form is:
5'---P5---i5---Read 1 primer site---[INSERT]---Read 2 primer site---i7---P7---3'
More precisely, reading from the P5 (left) end:
- P5 flow cell binding sequence (29 nt):
5'-AATGATACGGCGACCACCGAGATCTACAC-3' - i5 index (typically 8–10 nt, sample barcode)
- Read 1 sequencing primer binding site (~33 nt for TruSeq, ~34 nt for Nextera)
- [INSERT] — the target DNA fragment
- Read 2 sequencing primer binding site (~34 nt for TruSeq, ~34 nt for Nextera)
- i7 index (typically 8–10 nt, sample barcode)
- P7 flow cell binding sequence (24 nt):
5'-CAAGCAGAAGACGGCATACGAGAT-3'
TruSeq vs. Nextera Adapter Systems
These are the two dominant Illumina adapter families. They differ in their sequencing primer binding regions but share the same P5/P7 flow cell binding sequences.
- TruSeq: Uses ligation-based library prep. Adapter oligos are "forked" (Y-shaped) with a 12 nt complementary overlap. Ligation requires A-tailing of fragment ends. The top adapter strand has a phosphorothioate-protected 3' T overhang.
- Nextera: Uses tagmentation (Tn5 transposase). The transposome inserts adapter sequences at both ends of the fragment simultaneously, which are then extended and indexed by PCR. The Read 1 and Read 2 primer binding sequences differ from TruSeq.
Can R2 Go Next to P5/i5? (Orientation Constraints)
No—the orientation is fixed. P5 is always on the same end as i5 and the Read 1 primer site. P7 is always on the same end as i7 and the Read 2 primer site. This constraint exists because the physical flow cell lawn has two distinct oligo species (P5 and P7) grafted at fixed positions, and the sequencing workflow (Read 1 → Index 1 (i7) → Index 2 (i5) → Read 2) is hard-wired into the instrument's fluidics. Swapping R2 next to P5 would break cluster generation entirely.
Can TruSeq and Nextera Be Combined in a Pool?
Yes, with caveats. Illumina's sequencing reagent cartridges actually contain a mixture of sequencing primers from multiple adapter families (TruSeq, Nextera, and legacy kits). Therefore, libraries built with TruSeq adapters and libraries built with Nextera adapters can be pooled and sequenced together on the same flow cell lane. You can even mix-and-match within a single library molecule (e.g., TruSeq Read 1 site on one end and Nextera Read 2 site on the other), though this is not recommended because demultiplexing and trimming parameters differ between the two systems. The key constraint is that the P5 and P7 sequences must be full-length and intact on both ends.
Full-Length vs. Stubby-Y Adapters
There are two major physical adapter designs:
- Full-length adapters: Already contain P5/P7, index sequences, and sequencing primer sites. Used in PCR-free workflows where no further amplification is needed to complete the adapter.
- Stubby-Y (truncated) adapters: Contain only the sequencing primer binding site core but lack P5/P7 and indexes. An indexing PCR step is required after ligation to add the remaining sequences. This design offers higher ligation efficiency due to shorter oligo length, but mandates PCR amplification.
Unique Dual Indexing (UDI) vs. Combinatorial Dual Indexing (CDI)
In CDI, a small set of i5 and i7 indexes are used in all possible pairwise combinations. Any index hopping event can produce a valid (but incorrect) index pair, causing sample cross-contamination. In UDI, each sample receives a globally unique pair of i5+i7, so any hopped combination produces an invalid pair that filters computationally. Illumina now strongly recommends UDI for all patterned flow cell instruments (NovaSeq, NextSeq 1000/2000, NovaSeq X) because index hopping rates on patterned flow cells reach 0.1–2%5, driven by the kinetic exclusion amplification chemistry.
Cluster Generation
#### Bridge Amplification (Random Flow Cells)
Used on MiSeq, HiSeq 2500 (rapid run mode), and older instruments. Single-stranded library molecules are loaded onto the flow cell and hybridize to the P5 or P7 oligo lawn. The free end of each molecule folds over and hybridizes to the complementary oligo nearby, forming a "bridge." Polymerase extends, creating a double-stranded bridge. Denaturation yields two surface-tethered single strands. This cycle repeats ~35 times, generating a cluster of roughly 1,000 identical copies in a random physical location. Clusters are roughly 1 µm in diameter.
#### Exclusion Amplification (ExAmp, Patterned Flow Cells)
Used on NovaSeq 6000, NovaSeq X, NextSeq 1000/2000. Patterned flow cells have pre-etched nanowells at defined positions. Library, polymerase, and recombinase are mixed and loaded simultaneously. The first molecule to seed a nanowell is amplified so rapidly that it excludes other molecules from occupying that well (kinetic exclusion). This produces monoclonal clusters at uniform spacing, dramatically increasing cluster density and data output. However, ExAmp is more susceptible to index hopping because free library molecules are in contact with surface-bound molecules and recombinase for an extended period during amplification.
#### Post-Amplification Linearization
After cluster generation, reverse strands are cleaved and washed away, leaving only forward strands for Read 1 sequencing. After Read 1 and Index 1 are complete, the forward strands are removed, reverse strands are resynthesized by bridge amplification and then linearized for Read 2. This resynthesis step is why Read 2 quality is always slightly lower than Read 1—the resynthesis introduces additional stochastic error.
Sequencing Chemistry (Cyclic Reversible Termination)
#### The Incorporation Cycle
Each cycle consists of four steps:
- Incorporation: All four dNTPs (dATP, dCTP, dGTP, dTTP), each labeled with a distinct fluorophore and carrying a 3'-O-azidomethyl reversible terminator, are flowed across the clusters along with DNA polymerase. Natural competition ensures correct base incorporation with minimal bias.
- Wash: Unincorporated nucleotides and polymerase are washed away.
- Imaging: Laser excitation (two wavelengths) induces fluorescence. The flow cell is imaged tile-by-tile. Each cluster emits a color corresponding to the incorporated base.
- Cleavage: Chemical treatment removes the fluorophore and the 3' blocking group, regenerating a free 3'-OH ready for the next cycle.
#### 4-Channel, 2-Channel, and 1-Channel Chemistry
Illumina has used three different optical encoding schemes:
- 4-channel (HiSeq 2500, HiSeq 4000): Four distinct dyes, four images per cycle. Each base has a unique emission spectrum.
- 2-channel (NextSeq, NovaSeq 6000, NovaSeq X): Only two dye colors are used. A = green only, C = red+green, T = red only, G = no label (dark). Reduces imaging time and optical complexity but G calls rely on absence of signal, making them noisier.
- 1-channel (iSeq 100): Uses CMOS detection with a single-color scheme across two sequential images per cycle.
#### XLEAP-SBS Chemistry
Introduced on the NextSeq 1000/2000 and NovaSeq X Plus, XLEAP-SBS uses new nucleotide analogues and polymerases that dramatically reduce signal decay over the course of a run. Older chemistry showed ~50% signal intensity loss over 150 cycles; XLEAP maintains essentially flat signal. This enables longer reads (up to 2×300 on NextSeq 2000) with higher quality at the ends. Phasing remains the primary read-length limitation under XLEAP.
Error Profiles and Quality Decay
#### Phasing and Pre-Phasing
The dominant error mechanism in Illumina sequencing results from molecular desynchronization within clusters. Within each cluster, ~1,000 molecules should be in perfect synchrony. However:
- Phasing: A molecule fails to incorporate a nucleotide in a given cycle (incomplete 3' deblocking or steric hindrance). It falls one cycle behind the majority.
- Pre-phasing: A molecule incorporates two nucleotides in one cycle (defective terminator cap). It jumps one cycle ahead.
Typical phasing/pre-phasing rates are 0.1–0.2% per cycle. This seems small, but it compounds: after 250 cycles, approximately 50% of molecules are out of phase if uncorrected. Illumina's Real-Time Analysis (RTA) software applies computational phasing correction using the known rates estimated from early cycles (or empirically per-cycle on newer instruments), rescuing much of the signal. But there is a hard limit beyond which correction fails, which is why quality drops toward read ends.
#### Signal Decay (Photobleaching)
Repeated laser excitation damages fluorophores on the growing strand or causes photodamage to the DNA itself. This manifests as a progressive drop in signal intensity across cycles, compounding the phasing problem. Pre-XLEAP chemistries saw roughly 50% signal loss by cycle 150. The combined effect of phasing + bleaching means the signal-to-noise ratio degrades exponentially, ultimately making base calls unreliable. New chemistry (XLEAP) addresses bleaching but phasing remains.
#### Practical Quality Characteristics
- Read 2 is always lower quality than Read 1 because it requires reverse-strand resynthesis by bridge amplification before sequencing.
- The first few cycles (~1–5) often show slightly erratic base composition due to the sequencing primer binding and initial phasing correction calibration.
- Homopolymer accuracy is generally good (unlike flow-based chemistries) because only one base is incorporated per cycle.
- Substitution errors dominate; insertions/deletions are rare (~0.001% per base).
- Typical error rates: <0.1% at cycle 1, rising to ~1–1.5% by cycle 150. Overall, most instruments produce >85% bases ≥Q30.
- PhiX spike-in (5–20%) is used as an internal control and to increase library complexity for low-diversity samples.
#### The Chastity Filter
Before reads enter analysis, each cluster is assessed for signal purity. The "chastity" score is the ratio of the brightest signal to the sum of the two brightest signals. A score of 1.0 means a perfectly pure, monoclonal cluster. Clusters scoring <0.6 in the first 25 cycles are filtered out as "non-passing filter" (non-PF). Typical good runs achieve >80% PF.
#### Sequencing Read Order
The instrument sequences in a fixed order:
- Read 1: Sequencing primer hybridizes to the Read 1 site. Extension proceeds into the insert in the 5'→3' direction.
- Index 1 (i7): After Read 1 completes, the strand is washed and a new primer reads the i7 index.
- Index 2 (i5): On forward-strand instruments (e.g., NovaSeq, NextSeq), the i5 index is read after a second round of cluster preparation. On reverse-complement instruments (MiSeq, HiSeq 2500), i5 is read as the reverse complement.
- Read 2: Reverse strand is resynthesized and linearized. Read 2 primer hybridizes and extends into the insert from the opposite end.
This fixed order means you must always collect at least some Read 1 data (it sets spatial coordinates and phasing parameters), even if your biological interest is entirely in Read 2.
Element Biosciences AVITI (Avidity Sequencing)
Element Biosciences launched the AVITI in 2022, introducing a fundamentally different chemistry called "Avidity Sequencing" or "Avidite Base Chemistry" (ABC)6. The key innovation separates nucleotide identification from nucleotide incorporation using multivalent molecular complexes called "avidites." The instrument uses three different engineered polymerases, rolling circle amplification instead of bridge PCR, and a low-binding surface chemistry.
Element was founded in 2017 by Molly He, who previously led engineering teams at Illumina and was a co-inventor on multiple Illumina sequencing patents. The company raised over $400 million before launching its first instrument and explicitly designed the AVITI to accept Illumina-format libraries from day one—a strategic choice that dramatically lowered the switching cost for labs already invested in Illumina workflows. The "avidity" approach—using multivalent binding to amplify signal without modifying the DNA—was inspired by the immune system, where antibodies achieve high-avidity target recognition through multiple simultaneous weak interactions.
Library Structure and Compatibility
#### Circular Library Requirement
Unlike Illumina, AVITI requires circular library molecules as templates for rolling circle amplification. There are three routes to get there:
- Adept Workflow: Take any standard Illumina library (TruSeq or Nextera adapters) and circularize it off-instrument using Element's Adept kit. A splint oligo bridges the P5 and P7 adapter ends, and a ligase joins them into a circle. The circularized library is single-stranded. This allows labs to continue using their existing Illumina library prep kits.
- Elevate Workflow: Element's native library prep. Uses Element-specific adapters and indexes (96 UDI pairs, optimized for 4-channel color balance). Produces a linear library that is automatically circularized on-instrument during the sequencing run by the Cloudbreak chemistry.
- Cloudbreak Freestyle: The newest kit allows direct loading of linear Illumina libraries onto the AVITI, with automatic on-instrument circularization. This eliminates the off-bench Adept conversion step entirely.
#### Adapter Compatibility Details
The AVITI accepts standard Illumina TruSeq and Nextera adapter sequences. The sequencing primers used are essentially the same Illumina standard sequences, meaning the vast majority of existing Illumina libraries can run on the AVITI without modification beyond circularization. Key practical notes:
- Libraries must be amplified with a proofreading polymerase (e.g., KAPA HiFi, NEB Q5). Taq A-overhangs interfere with circularization.
- Very short inserts (shorter than the read length) cause problems because rolling circle amplification of tiny circles is inefficient.
- Small RNA-seq libraries require custom sequencing primers.
- Libraries treated with IDT/SwiftBio Normalase are NOT compatible.
- Higher library concentration is needed compared to Illumina (~5–16 nM vs. typical ~2 nM loading).
Polony Generation (Rolling Circle Amplification)
This is where AVITI diverges most dramatically from Illumina.
#### The RCA Process
The flow cell surface is coated with a low-binding chemistry studded with capture oligos complementary to the adapter sequences. When a circular library molecule hybridizes to a capture oligo, rolling circle amplification begins:
- 1. An RCA-specific polymerase (Polymerase #1 of three) initiates synthesis from the capture oligo, using the circular library as a template.
- 2. The polymerase traverses the full circle, then continues displacing its own previously synthesized strand as it enters the second lap.
- 3. This continues for many revolutions, producing a long single-stranded concatemer of tandem copies of the complement of the original library molecule.
- 4. The concatemer collapses into a tight ball on the surface—this is the "polony" (polymerase colony).
Each polony contains many copies of the same sequence, all in close physical proximity, analogous to an Illumina cluster but generated without PCR.
#### Advantages Over Bridge Amplification
- No PCR means no exponential amplification bias and no polymerase error propagation. RCA copies only the original template, over and over.
- Eliminates optical duplicates: each polony arises from a single template binding event.
- Eliminates index hopping entirely, because there is no recombinase-mediated strand invasion (as in ExAmp) and no free library molecules interacting with growing clusters.
- Polony duplication rates on AVITI are extremely low (typically <1%).
#### Throughput
A high-output AVITI flow cell contains approximately 1 billion polonies, each generating one read pair. The AVITI runs two independent flow cells simultaneously, yielding ~2 billion read pairs per run. Read lengths of 2×75 through 2×300 are supported with different kit configurations.
Sequencing Chemistry (Sequencing by Binding)
This is the most novel aspect of the platform. Unlike Illumina, where base identification and base incorporation happen in the same chemical step (a labeled nucleotide is incorporated), AVITI splits these into two distinct phases per cycle.
#### Phase 1: Detection (Avidite Binding)
After washing away any reagents from the previous cycle, the flow cell is flooded with a mixture of:
- An engineered "avidite-binding polymerase" (ABP, Polymerase #2). This is a modified polymerase that can bind template DNA and recruit a complementary nucleotide, but CANNOT catalyze incorporation.
- Four fluorescently-labeled avidites (one per base: A, C, G, T).
Each avidite is a multivalent molecular complex with the following structure:
- Core: Fluorophore-labeled streptavidin tetramer. Dyes are conjugated via lysine-NHS chemistry.
- Arms: Biotinylated polymer linkers ending in nucleotide triphosphates. Each core has ~3 nucleotide-bearing arms plus one arm that links to additional cores, forming higher-order multimers.
The ABP sits at the primer-template junction on each copy in the polony and attempts to recruit a complementary nucleotide. Since each copy in the polony is at the same position (they are synchronized), they all recruit the same avidite type. Because a single avidite molecule has multiple nucleotide arms, it simultaneously engages multiple ABP sites across the polony. This multivalent interaction creates an extremely stable complex through avidity (many weak interactions summing to a strong one), even though each individual nucleotide:polymerase interaction is transient.
The result: bright, stable fluorescent signal at nanomolar avidite concentrations (100-fold lower than the micromolar concentrations needed for labeled nucleotides in Illumina SBS). The fluorophore is also physically distant from the DNA, reducing photodamage.
#### Phase 2: Incorporation (Strand Extension)
After imaging, the ABPs and avidites are stripped away. The flow cell is then flooded with:
- An incorporation-optimized polymerase (Polymerase #3).
- Unlabeled, 3'-blocked reversible terminator nucleotides.
This polymerase incorporates a single, unmodified nucleotide at each position, then the 3' block is removed. Because incorporation uses unlabeled nucleotides, no fluorescent scars are left on the DNA. The growing strand is chemically identical to natural DNA.
#### Why This Matters
- Scarless DNA: No residual chemical modifications accumulate on the growing strand, which avoids the progressive signal degradation seen with Illumina's dye-labeled nucleotides.
- Optimized separately: The detection polymerase is engineered for specificity and avidite binding. The incorporation polymerase is engineered for speed and fidelity. Neither has to compromise.
- Phasing resistance: Dephased molecules in the polony lack adjacent in-phase neighbors, so wrong avidites cannot form multivalent complexes. They produce only weak, transient, undetectable background. This means phasing noise grows far more slowly than in Illumina.
- Homopolymer performance: Because detection and incorporation are separate and each cycle adds exactly one base, avidite sequencing maintains high accuracy through homopolymer stretches. Published data show essentially no increase in error rate post-homopolymer, whereas Illumina SBS shows a 5-fold error spike.
Error Profiles and Quality
- Most bases score Q40–Q50+ (1 error per 10,000–100,000 bases). With Cloudbreak UltraQ chemistry: ≥70% Q50, ≥90% Q40^[7].
- Quality remains high through the end of the read, with minimal drop-off compared to Illumina's steep quality decay.
- Read 1 and Read 2 quality are much more similar than on Illumina, because both reads start from primers on the same polony concatemer without resynthesis.
- Substitution errors, insertion errors, and deletion errors are all very low.
- Essentially zero index hopping (RCA eliminates the mechanism).
- Optical duplicate rate is extremely low (<1%).
- 4-channel imaging system with two excitation lines (~532 and ~635 nm) and four emission channels (~553, 596, 668, 716 nm).
Sequencing Order
With Cloudbreak chemistry, the AVITI sequences indexes (i7 and i5) before Read 1 and Read 2. This provides real-time QC and demultiplexing feedback before the long insert reads even begin—letting you catch loading or library problems early. Read order: Index 1 → Index 2 → Read 1 → Read 2.
Output and Run Times
| Parameter | AVITI (High Output) | AVITI (Low Output) |
|---|---|---|
| Reads per flow cell | ~1 billion | ~100 million |
| Reads per run (2 FC) | ~2 billion | ~200 million |
| Read lengths | 2×75 to 2×300 | 2×75 to 2×300 |
| 2×150 run time | <40 hours | <40 hours |
| Quality | ≥90% Q40 | ≥90% Q40 |
Ultima Genomics (Flow-Based SBS)
Ultima Genomics launched the UG 100 in February 20248, representing the most radical hardware departure from the standard flow cell paradigm. It uses a spinning silicon wafer, emulsion PCR for clonal amplification, and a non-terminating single-nucleotide flow chemistry. The instrument is designed for ultra-high throughput at extreme cost efficiency (targeting the $100 genome9).
Library Structure
#### Native Ultima Libraries
Ultima's library structure differs from Illumina's:
- Sequencing end: Contains a Primer for Sequencing (PS) site plus a Sample Barcode (PS-SBC).
- Bead capture end: Contains a Unique Bead Adapter (UBA) sequence necessary for hybridization to sequencing beads during emulsion PCR.
Ultima provides two library preparation workflows:
- Solaris Free: A PCR-free library prep. Compatible with many third-party kits. Adds Ultima-specific adapters to fragmented DNA.
- Solaris Flex: Allows adaptation of existing partial or complete libraries (including Illumina libraries) through a simple PCR step that appends Ultima-specific adapter overhangs.
#### Illumina Library Conversion
Illumina libraries can be converted to Ultima format via PCR. Primers anneal to the existing Read 1/Read 2 regions and add Ultima-specific PS-SBC and UBA overhangs. The P5/P7 sequences are effectively replaced. This conversion has been demonstrated for 10x Genomics single-cell libraries, Olink proteomics libraries, and standard WGS/RNA-seq preps.
#### Single-End Reads (Critical Difference)
Ultima sequencing is inherently single-end. Each bead (and therefore each read) sequences from one end of the library insert only. There is no equivalent of Illumina's paired-end resynthesis. Read lengths follow a distribution (not fixed), with a median of ≥300 bases and post-filtering median of ~250 bases. For applications requiring paired-end information, the single-end reads are computationally split into simulated paired-end format.
Clonal Amplification (Emulsion PCR on Beads)
Ultima uses off-instrument, automated emulsion PCR rather than on-surface amplification:
- 1. Library molecules are mixed with sequencing beads bearing capture oligos complementary to the UBA adapter. Each bead captures one (ideally) or a few library molecules.
- 2. The mixture is compartmentalized into an emulsion: oil droplets encapsulate individual beads with reagents.
- 3. PCR occurs within each droplet, clonally amplifying the captured library molecule(s) on the surface of the bead.
- 4. After amplification, the emulsion is broken and beads are recovered.
For ppmSeq (paired plus-minus sequencing), both strands of each original DNA duplex are captured on the same bead during emulsion PCR. Denaturation occurs within the emulsion droplet after bead ligation, so forward and reverse strand templates are co-amplified on a single bead. This enables downstream computational duplex error correction.
Bead Loading onto the Wafer
The amplified beads are loaded onto a 200mm silicon wafer (the same diameter as standard semiconductor wafers). The wafer surface is patterned at micron scale with an array of electrostatic landing pads. Beads settle onto these pads, ideally one bead per pad. A high-output wafer holds approximately 10–12 billion beads/reads.
The Spinning Wafer Architecture
This is Ultima's most distinctive hardware feature. Instead of a sealed flow cell with microfluidics and a scanning camera:
- The wafer sits flat and spins like a CD.
- Reagents are dispensed onto the center of the spinning wafer and distributed by centrifugal force (spin coating), producing a uniform thin film. Each nucleotide is delivered through a separate nozzle, eliminating cross-contamination between flows.
- Two fixed-position cameras image the wafer continuously as it rotates beneath them, rather than the camera moving across a stationary surface.
- The instrument can run two wafers simultaneously, alternating between a chemistry station and an imaging station.
- Six wafers can be loaded at once; the instrument runs continuously with hot-swappable reagents and consumables.
Sequencing Chemistry (Single-Nucleotide Flow, Non-Terminating SBS)
Ultima's chemistry is conceptually related to the extinct 454/Ion Torrent pyrosequencing approach but with critical innovations.
#### Flow Order
Nucleotides are introduced one species at a time in a repeating order (e.g., T, G, C, A, T, G, C, A...). Each introduction of one nucleotide species constitutes one "flow." Four flows (one of each base) constitute one "flow cycle." A run of ~300 base median length requires ~444 flows (~111 flow cycles).
#### Mostly Natural Nucleotides (mnSBS)
The key innovation: each flow delivers a mixture of mostly unmodified, natural nucleotides plus a minority (<20%) of fluorescently-labeled nucleotides. The polymerase remains processively bound to the template and incorporates bases without a terminator—meaning it can incorporate multiple nucleotides per flow if the template has a homopolymer. The labeled fraction provides optical signal; the unlabeled majority keeps polymerase kinetics and fidelity close to natural.
After each flow, the wafer is imaged at steady state. A fluorophore cleavage step removes the labels, leaving natural DNA. Because the dyes are cleaved, no scars accumulate.
#### Base Calling in Flow Space
Base calling on Ultima operates in "flow space" rather than "sequence space." For each flow, the system must determine how many nucleotides of that species were incorporated (0, 1, 2, 3, 4...). The identity of the base is never in question (you know which nucleotide was flowed), so substitution errors are inherently very rare. The challenge is accurately determining the number of incorporations, especially for homopolymers. Ultima uses a deep convolutional neural network (CNN) trained on large, diverse datasets to convert raw signal intensities into base calls.
#### Homopolymer Challenge
Because the chemistry is non-terminating, a homopolymer run (e.g., AAAAAAA) results in all 7 A's being incorporated in a single flow. The system must count the number of incorporations from the signal intensity. For short homopolymers (≤8–10 bp), accuracy is high. For longer homopolymers (>12 bp), accuracy degrades, and these regions are excluded from Ultima's high-confidence region (HCR). This is the classic limitation of flow-based chemistries, shared historically with 454 and Ion Torrent, though Ultima's ML-based calling substantially outperforms those predecessors.
Error Profile
- Substitution errors are extremely low because base identity is defined by which nucleotide was flowed.
- Insertion/deletion errors in homopolymer regions are the dominant error type.
- SNV F1 score: 99.8%. INDEL F1 score: 99.4%.
- SNVQ (Single Nucleotide Variant Quality) scores are reported instead of traditional per-base Q scores. SNVQ represents the error probability of a specific base substitution (e.g., A→G) rather than any substitution.
- Read lengths are variable, following a distribution. Median raw ≥300 bp, post-filter ~250 bp.
ppmSeq (Paired Plus-Minus Sequencing)
Ultima's unique accuracy feature. By capturing both strands of a DNA duplex on a single bead during emulsion PCR, then sequencing both, the system can computationally compare forward and reverse strand reads. Any base call disagreement between the two strands likely represents a sequencing artifact or DNA damage and can be filtered. This achieves a raw read accuracy of Q60 (one error per million bases) for SNVs, enabling applications like liquid biopsy and minimal residual disease detection9.
Output and Run Times
| Parameter | UG 100 |
|---|---|
| Reads per wafer | 10–12 billion |
| Data per wafer | ≥2.5–3.0 terabases |
| Run time (≥300 bp) | ~20 hours per wafer |
| Read type | Single-end (variable length) |
| Read length median | ≥300 bp raw; ~250 bp post-filter |
| Genomes/year (30X) | ~20,000 |
| Cost per genome (30X) | ~$100 |
MGI / DNBSEQ (DNA Nanoball Sequencing)
MGI Tech, a subsidiary of BGI Genomics, manufactures DNBSEQ platforms that trace their lineage to Complete Genomics' DNA nanoball (DNB) technology. Drmanac et al. published the foundational work in 200910, describing rolling circle amplification with phi29 polymerase to generate nanometer-scale DNA nanoballs containing ~300–500 template copies from circularized library fragments. These nanoballs are subsequently loaded onto a patterned flow cell and interrogated by either cPAS (combinatorial Probe-Anchor Synthesis) ligation chemistry or the CoolMPS unlabeled-nucleotide approach with antibody-dye detection. Today, MGI instruments command ~47% of the sequencing market in China, driven by favorable pricing and regulatory approval, though Illumina retains ~90% share in North America and Western Europe.
Library Architecture
MGI libraries are linear, double-stranded DNA molecules with P5 and P7 flow cell binding sequences at the ends, similar to Illumina adapters but with distinct sequence identities. Libraries can be generated de novo or converted from Illumina-compatible libraries via adapter-conversion kits, enabling existing Illumina library prep ecosystems to be run on DNBSEQ instruments with minimal workflow changes. This interoperability has been strategically important for MGI's market penetration in labs already invested in Illumina infrastructure.
DNA Nanoball Formation
Rolling circle amplification is the defining feature. A circularized library fragment serves as a template for phi29 polymerase, which synthesizes a long concatemer — many tandem copies of the circular sequence — without dissociating from the template. The resulting ~20–40 kb concatemer is then annealed to itself via complementary sequences in the concatemer, forming a highly compact nanoball (~200 nm diameter) with extraordinarily high local template concentration (300–500 copies in a volume of ~4 million nm³). This geometric concentration advantage is not leveraged during sequencing itself but serves as a stable, dense cluster analogue to bridge-amplified or ExAmp clusters. DNA nanoballs exhibit superior photostability compared to linear DNA, a considerable practical advantage when optical reading cycles are numerous.
Sequencing Chemistry: cPAS and CoolMPS
MGI offers two readout chemistries. cPAS (combinatorial Probe-Anchor Synthesis) uses sequential ligation of fluorescently labeled probes. In each cycle, four probe pools — each carrying a distinct fluorophore and complementary to a specific base at a specific position — are flowed across the nanoballs. After hybridization and ligation to the nanoball-tethered DNA strand, the array is imaged and the ligated probe is cleaved, regenerating the 5' phosphate for the next cycle. This ligation-based approach is slower than polymerase-driven sequencing but offers high specificity because ligation is thermodynamically more discriminating than nucleotide incorporation.
CoolMPS (Cool-labeled Multiplex Sequencing) uses unlabeled nucleotides flowed one base at a time, with incorporated nucleotides detected by base-specific antibody-dye conjugates. This avoids the need for fluorescently modified dNTPs and their potential for misincorporation artifacts, though it requires careful antibody engineering to ensure specificity.
Instruments and Throughput
| Instrument | Mode | Max Output per Run | Run Time | Typical Read Length |
|---|---|---|---|---|
| DNBSEQ-G99 | Benchtop | 8–240 Gb (PE100) | ~20–48 hours | 100 bp paired-end |
| DNBSEQ-G400 | Mid-throughput | Up to 1,080 Gb (PE150) | ~24–72 hours | 150 bp paired-end |
| DNBSEQ-T7 | High-throughput | Up to 6 Tb/day (1–4 flow cells) | ~24 hours | 150 bp paired-end; ~5 billion reads |
| DNBSEQ-T7+ | Ultra high-throughput | Up to 14 Tb/day | ~24 hours | 150 bp paired-end |
Error Profile
DNBSeq error characteristics closely resemble Illumina sequencing: substitution errors dominate (~0.1–0.3% per base), insertions and deletions are rare (~0.01%), and homopolymer performance is excellent due to the base-per-cycle sequencing logic. Quality decay toward read ends is present but generally milder than on older Illumina instruments, likely a consequence of the chemistries' refinement over the past decade and the higher stability of nanoball clusters. Read 2 quality is slightly lower than Read 1, as with Illumina, reflecting the resynthesis step required for reverse-strand reading.
Patent Settlement and Market Dynamics
In July 2022, MGI and Illumina announced settlement of a multi-year patent dispute, with Illumina agreeing to a one-time $325 million payment. While the settlement included cross-licensing provisions, neither party obtained exclusive rights to the other's core IP. This de-escalation reflected the reality that both firms had defensible patent portfolios covering complementary aspects of cluster-generation and chemistry. The settlement enabled MGI to continue aggressive commercial expansion outside the US without threat of litigation, while Illumina secured a substantial one-time revenue stream from a firm that had captured significant market share in Asia.
Practical Considerations
For Western labs with established Illumina ecosystems, DNBSEQ adds throughput optionality at modest cost, especially for low-complexity applications (WGS, exome, amplicon panels). The compatibility of Illumina-converted libraries simplifies adoption. However, the instruments require substantial capital investment (~$500k–$2M depending on model), and reagent supply chains outside China remain nascent. Error profiles being near-identical to Illumina, there is limited technical advantage in applications where Illumina is already serviceable; cost becomes the primary decision metric.
Sanger Sequencing (Chain Termination)
Sanger sequencing, developed by Frederick Sanger in 19771, is the original DNA sequencing method and persists as the gold standard for variant validation, small-scale sequencing, and clinical confirmatory work. It uses chain termination — dideoxynucleotides (ddNTPs) lacking the 3'-OH required for phosphodiester bond formation — to generate a stepwise population of fragments whose terminal base encodes the template sequence position-by-position.
Sanger is one of only five individuals to have won two Nobel Prizes (alongside Marie Curie, Linus Pauling, John Bardeen, and Barry Sharpless), and the only person to have won the Chemistry Nobel twice. His first (1958) honored the determination of insulin's amino acid sequence. His second (1980) recognized the chain-termination method, shared with Walter Gilbert and Paul Berg. Sanger sequencing powered the Human Genome Project, which consumed thirteen years and ~$2.7 billion to produce the first human reference. Upon retirement, he declined a knighthood, reportedly saying he did not wish to be called "Sir."
Library Structure and Template Preparation
Sanger sequencing interrogates a single template per reaction. Common inputs are a recombinant plasmid with the target insert (sequenced with universal primers like M13 forward/reverse flanking the cloning site), a purified PCR amplicon with known primer binding sites, or genomic DNA with custom sequencing primers at known flanking regions. There is no adapter ligation, no indexing, no clonal amplification. One template + one primer = one read.
The Chain Termination Reaction
Each reaction contains the template, a sequencing primer annealing to a known region upstream of the target, DNA polymerase (historically Klenow fragment, now thermostable variants like Thermo Sequenase), dNTPs at high concentration (~100–300 µM), and ddNTPs each labeled with a distinct fluorescent dye. The dNTP:ddNTP ratio (~100:1 to 300:1) ensures that at any position in the template, a statistical population of synthesized copies incorporates a normal dNTP (continuing synthesis) or a ddNTP (terminating). After thermal cycling (25–30 cycles), the reaction contains a nested set of fragments ranging from the primer's annealing site to every base in the readable region, each terminated by a color-coded ddNTP.
Capillary Electrophoresis and Basecalling
Modern Sanger sequencing uses capillary electrophoresis rather than slab gels. The standard instruments are the Applied Biosystems 3730xl (96 capillaries; workhorse of the Human Genome Project and most core facilities), the 3500/3500xL (8–24 capillaries; clinical/low-throughput settings), and the SeqStudio (4–8 capillaries; benchtop). Terminated fragments are loaded into fused-silica capillaries filled with POP-7, a linear polyacrylamide sieving matrix. Fragments separate by size under an applied electric field (~15 kV). As each fragment transits the detection window, an argon-ion laser (or LED on newer instruments) excites the terminal ddNTP fluorophore. A CCD camera or photodiode array records the four-color emission through spectral filters, generating a four-color electropherogram where each peak is one base position.
Base calling assigns Phred quality scores4 to each called base. Q20 = 1% error; Q30 = 0.1%; Q40 = 0.01%. A typical good Sanger read yields 700–900 bases of Q20+ sequence. The first ~30–50 bases are unreliable (primer peak artifacts). Quality degrades beyond ~800–900 bases as fragment resolution decays.
Error Profile
Within the readable window, per-base error rates are ~0.1–1% (Q20–Q30), and after manual trace editing, accuracy approaches 99.99% (Q40). Errors occur predominantly at positions with overlapping or compressed peaks, especially in GC-rich regions where DNA secondary structure causes anomalous fragment migration (betaine or dITP substitution can resolve compressions). Homopolymer runs >8–10 bp cause peak broadening and merging, complicating base counting. Mixed templates (heterozygous SNPs, mixed bacterial populations) produce double peaks, which software flags but which complicate automated calling.
Read Length and Throughput
| Parameter | Typical Value |
|---|---|
| Read length | 700–1,000 bases (Q20+) |
| Maximum read length | ~1,200 bases under optimal conditions |
| Reads per run (3730xl) | 96 (one per capillary) |
| Run time | ~2–3 hours per plate |
| Daily throughput (3730xl) | ~1,500 reads (~1.2 Mb/day) |
| Cost per read | ~$3–$8 (reagents + instrument time) |
Practical Considerations
Sanger remains widely used for regulatory and clinical confirmation of NGS-detected variants, though mandatory Sanger orthogonal confirmation is evolving — some high-complexity CLIA labs now accept confirmation by an independent NGS method for high-confidence calls at sufficient depth. For longer regions, primer walking — successive rounds of sequencing with primers ~500 bp apart — remains standard for bacterial genome finishing or plasmid characterization, though labor-intensive. Cost scales linearly with read number; no economy of scale exists (1,000 amplicons cost 1,000× one amplicon). Sanger can detect a minor allele only at ≥15–20% frequency; below this threshold, the heterozygous peak is noise.
Oxford Nanopore Technologies (Nanopore Sequencing)
Oxford Nanopore Technologies reads individual DNA or RNA molecules in real time using protein nanopores embedded in a synthetic lipid bilayer. An ionic current flows through the pore; as nucleic acid translocates through the constriction, bases produce characteristic current modulations that a neural network converts to sequence. This is the only major sequencing platform reading native DNA directly — no amplification, synthesis, or labeling of the template itself.
In August 2016, NASA astronaut Kate Rubins sequenced DNA aboard the International Space Station using a MinION, the first-ever DNA sequencing in microgravity13. The device required only a laptop and a USB port. Multiple ISS runs over six months yielded sequences from bacteriophage lambda, E. coli, and mouse mitochondrial DNA. No other sequencer has left Earth, and the MinION remains the only sequencer that fits in a coat pocket.
Library Architecture
ONT libraries are simpler than competing platforms. The core requirement is a motor protein (helicase or translocase) ligated to the DNA end, controlling translocation speed through the pore. Standard library methods are: ligation-based (LSK kits, e.g., SQK-LSK114) — end repair, dA-tailing, adapter ligation; no PCR required; preserves native base modifications; ~1 µg high-molecular-weight DNA optimal. Rapid (RAP) — transposase fragments and appends adapters in a 10-minute single-tube reaction; faster but shorter fragments. PCR-based (PCB) — amplification for low-input samples; loses modification information. Direct RNA (RNA004) — poly(A)-selected RNA with oligo(dT)-primed reverse transcription; the RNA strand (not cDNA) is sequenced; preserves RNA modifications (m6A, pseudouridine, m5C, etc.).
The Nanopore and Translocation
The biological nanopore is a modified CsgG protein (from E. coli curli secretion) designated R10.4.1. It sits in a synthetic lipid bilayer across a microwell on a CMOS sensor. Key features: the R10.4.1 pore has two narrow constrictions spaced ~9 nucleotides apart, providing dual measurements of each k-mer and improving accuracy versus earlier single-constriction pores (R9.4.1). A motor protein bound at the pore entrance ratchets DNA one base at a time at ~400–450 bases/second — much slower than unassisted translocation (~1 µs/base) and thus measurable. A voltage (~180 mV) across the membrane drives K+ and Cl− ions through the pore; DNA in the constriction drops the current by an amount characteristic of the ~5-mer in the sensing zone. Each base position produces a "squiggle" — raw current sampled at 5 kHz.
Base Calling and Accuracy
Raw ionic current is converted to sequence by deep neural networks. The current basecaller is Dorado, transformer-based. Three speed/accuracy tiers: Fast = lowest latency, ~Q15–Q18 median; HAC (high-accuracy calling) = default for most applications, ~Q20–Q22 median; SUP (super-accurate) = highest per-read accuracy, ~Q23–Q25 median. Additionally, Duplex mode = both DNA strands sequenced consecutively through the same pore, computationally merged for Q30+ (99.9%).
Because the nanopore reads unmodified DNA directly, any chemical modification (5-methylcytosine, 6-methyladenine, 5-hydroxymethylcytosine, etc.) produces a characteristic current perturbation. Dorado's modification-aware models call base modifications simultaneously with primary sequence — no bisulfite, antibodies, or enzymes required. This is unique among all major sequencing platforms.
Read Length
ONT has no theoretical upper limit — determined entirely by input DNA fragment size. Ultra-long read protocols (agarose-plug-based DNA extraction, Circulomics Short Read Eliminator) routinely generate reads >100 kb; reads exceeding 4 Mb have been reported14. Library-method read-length distributions: Ligation (standard) = N50 10–30 kb; Ultra-long = N50 50–100+ kb; Rapid = N50 5–15 kb.
Instruments and Throughput
| Instrument | Flow Cell Pores | Output per Flow Cell | Run Time |
|---|---|---|---|
| Flongle | 126 | ~2.8 Gb | Up to 24 hours |
| MinION / Mk1C | 2,048 | ~50 Gb | Up to 72 hours |
| P2 Solo | 2,048 | ~50 Gb | Up to 72 hours |
| PromethION (P24/P48) | 2,675 per FC × 24 or 48 FC | ~290 Gb per FC; up to 14 Tb per run | Up to 72 hours |
Error Profile and Accuracy Limits
Simplex raw accuracy (R10.4.1 + SUP model) = Q20–Q25 median (~99.0–99.7%); remaining errors are dominated by indels in homopolymer stretches and occasional substitutions. Duplex accuracy = Q30+ (~99.9%). At moderate coverage (≥30×), variant-calling accuracy matches or exceeds short-read platforms for SNVs and substantially outperforms for structural variants due to long reads spanning breakpoints. Long homopolymers (>8–10 bp) remain the primary limitation — insertion/deletion errors occur because current-signal differences between (e.g.) 8 and 9 identical consecutive bases are minimal. The R10.4.1 dual-constriction pore improved homopolymer calling over R9.4.1 but this remains the dominant error mode. Certain sequence contexts (long homopolymers, some tandem repeats) show elevated, partially correlated errors across reads, limiting additional-coverage benefit for those specific motifs.
Adaptive Sampling
A distinctive ONT capability: the instrument can eject a DNA molecule mid-read if initial sequence does not match a target of interest ("Read Until" / adaptive sampling). The pore is freed for the next molecule. This enables real-time target enrichment without prior capture or amplification — e.g., sequencing only a set of clinically relevant genes from a whole-genome library, achieving 5–10× on-target enrichment. Adaptive sampling also works in reverse, depleting unwanted sequences (host DNA in metagenomic samples).
Pacific Biosciences (PacBio) SMRT Sequencing
PacBio's Single Molecule, Real-Time (SMRT) sequencing observes a polymerase molecule incorporating fluorescently labeled nucleotides into a growing complementary strand in real time. The polymerase is immobilized at the bottom of a zero-mode waveguide (ZMW) — a nanophotonic structure confining the observation volume to zeptoliters, enabling single-molecule fluorescence detection against a background of freely diffusing labeled nucleotides.
The ZMW concept traces to Jonas Korlach, Stephen Turner, and colleagues at Cornell, drawing on nanophotonics from Harold Craighead's lab. A metal aperture smaller than light wavelength creates an evanescent field rather than a propagating wave, confining illumination to ~20 zeptoliters such that single fluorescent molecules become detectable against a micromolar background. PacBio was founded in 2004 and shipped its first commercial instrument (RS) in 201112. The 2019 introduction of HiFi sequencing — computationally merging multiple noisy passes around a circular template into one highly accurate consensus read — transformed PacBio from a niche long-read platform into a contender for population-scale genome sequencing16. The T2T Consortium's 2022 completion of T2T-CHM13, the first truly complete telomere-to-telomere human genome, relied heavily on PacBio HiFi for base-level accuracy and Oxford Nanopore ultra-long reads for spanning centromeric repeats17.
Library Structure (SMRTbell)
PacBio libraries are circular molecules called SMRTbells: a target double-stranded DNA fragment capped at both ends with single-stranded hairpin loops (ligated via T4 DNA ligase), converting a linear molecule into a topologically closed, dumbbell-shaped circle. Circularity is critical because the polymerase can traverse the SMRTbell multiple times (rolling-circle fashion around the dumbbell), generating multiple passes over the same insert. Each complete traversal of both strands = one "pass." Multiple passes enable intramolecular error correction for high-fidelity (HiFi) consensus reads.
Construction workflow: target DNA is sheared (WGS) or left intact (ultra-long), end-repaired, and ligated to hairpin adapters. A sequencing primer anneals to the adapter; the polymerase-bound SMRTbell complex is loaded. Size selection precedes loading: HiFi optimal inserts are 10–20 kb; CLR (continuous long read) uses >40 kb.
Zero-Mode Waveguides (ZMWs)
The SMRT Cell is a silicon chip containing millions of ZMWs — cylindrical holes approximately 70–100 nm diameter, ~100 nm deep, fabricated in aluminum on glass. The diameter is smaller than excitation light wavelength (~532 nm), so light cannot propagate through; an evanescent field instead decays from the well bottom, illuminating ~30 nm = ~20 zeptoliters detection volume. The polymerase is chemically tethered at each well bottom. Fluorescently labeled nucleotides diffuse freely in bulk solution but become visible only when entering the ZMW and bound by polymerase (residence ~10–100 ms). Bulk concentration (~µM) ensures rapid diffusion in/out; only the incorporated nucleotide immobilizes long enough to emit fluorescence.
The Revio SMRT Cell 25M contains ~25 million ZMWs; 8–12 million typically yield productive reads (remainder empty or containing multiple polymerases).
Sequencing Chemistry
PacBio uses phospholinked nucleotides — the fluorescent dye is attached to the terminal phosphate, not the base. During incorporation: the labeled nucleotide diffuses into the ZMW and binds the polymerase active site. While held (10–100 ms), the fluorophore emits a color-coded light pulse detected by the sensor below. The polymerase catalyzes phosphodiester bond formation, cleaving the diphosphate (carrying the fluorophore) from the monophosphate, which is incorporated into the growing strand. The released dye-labeled pyrophosphate diffuses out; the incorporated nucleotide retains no fluorescent modification — the strand is natural DNA. This "label-then-cleave" design avoids fluorescent scars and allows the polymerase to process a natural template, preserving ability to detect kinetic signatures of base modifications.
HiFi vs. CLR
PacBio operates in two primary modes. HiFi (CCS) = the polymerase makes ≥3 full passes (typically 8–15) around a short SMRTbell (10–20 kb insert). Subreads from each pass are computationally aligned into a single consensus read with accuracy ≥Q30 (99.9%). Tradeoff: insert size is capped by polymerase processivity; read lengths are 10–25 kb at Q30+. CLR (Continuous Long Read) = long SMRTbells (>40 kb inserts) are sequenced with a single pass. Raw accuracy is ~85–90% (Q10–Q15) with errors dominated by indels. CLR reads exceed 100 kb. Useful for scaffolding, structural variant detection, and de novo assembly when combined with HiFi data.
SPRQ (Sequencing Plate-Ready Q-chemistry) is PacBio's latest HiFi chemistry for Revio, offering longer-lived polymerase, improved reagent stability, 500 ng input requirement (down from 1 µg), and ~33% higher HiFi yield per SMRT Cell. SPRQ enables two 30× human genomes per SMRT Cell at ≥Q30 accuracy18.
Kinetic Base Modification Detection
The polymerase pauses or slows at modified bases (e.g., m6A, m4C) because the modified template alters enzyme kinetics. By analyzing interpulse duration (IPD) — the time between successive fluorescent pulses — PacBio detects base modifications directly from sequencing data without chemical treatment. HiFi mode enables m6A and CpG methylation (5mC) detection at single-molecule resolution with high confidence, requiring kinetic information from multiple passes to distinguish modification-induced slowdowns from stochastic variation.
Error Profile
HiFi accuracy = ≥Q30 median (99.9%), with many reads ≥Q40. Remaining errors are approximately evenly split among substitutions, insertions, and deletions with no strong sequence-context bias. CLR raw accuracy is ~Q10–Q15 (~85–90%); errors are heavily weighted toward indels. At ≥30× coverage, HiFi variant-calling accuracy rivals or exceeds short-read platforms for SNVs and structural variants.
Instruments and Throughput
The Revio processes SMRT Cells sequentially; one Revio generates ~25M HiFi reads per run at ~30× coverage of a human genome (sufficient for variant calling and small structural variants). The Vega is a benchtop system optimized for long-read applications. PacBio's market position is strongest in de novo assembly, diploid phasing, isoform sequencing, and structural variant detection where long reads with high consensus accuracy provide substantial advantages over short-read alternatives.
Ion Torrent (Semiconductor Sequencing)
Ion Torrent sequencing detects hydrogen ions released during nucleotide incorporation using an ion-sensitive field-effect transistor (ISFET) — a semiconductor device without any optics. Rothberg et al. (2011)19 introduced this platform, describing ISFET-based detection where natural (unmodified) nucleotides are flowed one species at a time. Each nucleotide incorporation releases a proton; the ISFET array measures the pH change in the well, converting the electrical signal to base identity. The elegance of the approach — no dyes, no fluorescence, no lasers — appeals to developers of point-of-care and field-deployable sequencing.
Library and Amplification
Ion Torrent libraries are Illumina-compatible linear DNA molecules with P1 and A adapters at the ends (P1 / A nomenclature differs from Illumina's P5 / P7 but the function is identical). Library preparation mirrors Illumina workflows: fragmentation, end-repair, adapter ligation, and optional indexing. Libraries are amplified on Ion Sphere Particles (ISPs) via emulsion PCR: DNA-coated beads are mixed with oil and aqueous PCR reagents to form a water-in-oil emulsion. Each bead receives ~1 template molecule, amplification occurs within the emulsion droplet, and after PCR, the beads are recovered and loaded onto the instrument's chip. Each bead lands in one well; the well surface is coated with ion sensor (ISFET) material.
ISFET Array and Detection
The original Ion PGM platform contained ~1.2 million wells, each ~14 µm diameter. Current Ion GeneStudio S5 chips contain up to 16.8 million microwells. As nucleotides are flowed one species at a time (dATP, then dTTP, then dCTP, then dGTP, repeat), if a template base matches the incoming nucleotide, polymerase incorporates it, releasing H+. The local pH drop is detected by the ISFET, generating a voltage pulse proportional to the number of bases incorporated in that cycle (allowing detection of homopolymer runs by the magnitude of the pH drop — if 4 consecutive As are incorporated, the pH change is ~4× larger than for 1 A).
Read Length and Throughput
| Instrument | Typical Read Length | Reads per Run | Output per Run | Run Time |
|---|---|---|---|---|
| Ion GeneStudio S5 | 200–400 bp | 2M–130M (chip-dependent) | ~500 Mb–15 Gb | ~2.5–8 hours |
| Ion Genexus (automated) | 200–400 bp | Chip-dependent | ~500 Mb–5 Gb | ~6 hours (sample to results) |
Error Profile
The dominant error mode is homopolymer-associated indels. At ~1.5 errors/100 bases20, the error rate exceeds short-read platforms, but substitution errors are rare — the opposite profile of Illumina and similar to 454 pyrosequencing. This homopolymer problem arises because the ISFET measures total pH change in a cycle; consecutive identical bases produce a voltage proportional to the count (e.g., AAAA produces signal 4× that of a single A). However, stochastic noise and slight variations in polymerase kinetics make calling homopolymer lengths unreliable. Ion Torrent reads of complex, homopolymer-rich regions (e.g., STR loci) are substantially less accurate than Illumina or short-read platforms for the same region.
Applications and Market Position
Ion Torrent's primary clinical niche is targeted gene panels: AmpliSeq amplicon panels (designed, PCR-amplified templates with uniform coverage) sidestep the homopolymer problem because amplicons are typically <600 bp and gene-specific design avoids long homopolymer stretches. Oncomine cancer panels, inherited disease panels, and resistance-marker panels (for pathogens) are widespread in clinical labs. Ion Genexus — an integrated sample-to-results system combining library prep, emulsion PCR, sequencing, and analysis in one workflow — has captured niche adoption in high-volume, single-assay settings (e.g., large clinical labs running the same ~50-gene panel daily). However, Illumina's much larger ecosystem and dramatically better homopolymer performance have limited Ion Torrent's expansion into WGS or discovery applications.
Roche Sequencing by Expansion (SBX)
Roche's Sequencing by Expansion (SBX) is a novel hybrid of nanopore sensing and nucleotide incorporation, designed for ultra-fast, high-throughput clinical genomics. Kokoris et al. (2025)21 and subsequent clinical validation papers22 describe the technology: DNA template molecules are immobilized on a patterned wafer surface. An engineered polymerase incorporates nucleotides that have been chemically modified with ~2 nm protein label tags. As each nucleotide is incorporated, the protein label is cleaved, and the resulting void (~2 nm gap in the protein coating) is detected by nanopore-like sensing — allowing unambiguous base-by-base reading. This "sequencing by expansion" of the read-sensing volume offers ~50-fold improvement in spatial resolution compared to diffraction-limited optical sequencing, without requiring the full single-molecule optical infrastructure of PacBio.
Key Characteristics
Read length is ~150–350 bp per duplex (insert), limited by the chemical stability of the protein labels under the heating and cooling cycles required for duplex synthesis (the wafer undergoes controlled thermal expansion to bring complementary strands into register — hence "expansion"). The workflow includes a 2-hour upstream expansion chemistry step before sequencing. Duplex insert size is capped at ~350 bp by the mechanics of the thermal-expansion process; paired-end sequencing is not possible as with Illumina or short-read platforms.
Clinical Performance
In early 2025, Roche and the Broad Institute Clinical Laboratories announced a Guinness World Record: blood-to-VCF analysis (sample collection, DNA extraction, library prep, sequencing, variant calling, final report) in under 4 hours (3h59m)22 on a single human exome. This required automated sample handling, streamlined bioinformatics, and parallelized instrument operation — not achievable on any other platform. The performance targets suggest SBX will be compelling for rapid diagnostic genomics, population-scale screening, and any setting where turnaround time and throughput per dollar are paramount. However, the platform is not yet commercially available to general labs, and independent benchmarking is limited to early-access users.
Limitations and Caveats
No epigenetic modification detection capability is evident (the protein label cleavage is destructive and prevents kinetic analysis). Read lengths shorter than Oxford Nanopore or PacBio limit applications requiring long-range structural variant detection or de novo assembly. Clinical regulatory pathways (FDA 510k, CLIA certification) have not yet been navigated, so actual deployment timelines and market adoption remain uncertain.
Data Formats and Quality Metrics
FASTQ Format
The universal format for raw sequencing reads. Each record contains four lines: sequence identifier (header), DNA sequence, separator, and Phred quality scores (one per base). Phred quality scores are encoded as ASCII printable characters representing integer quality values: Q = −10 log₁₀(error probability). Q20 = 1% error; Q30 = 0.1%; Q40 = 0.01%.
Phred Scores and Quality Assessment
Phred scoring (Ewing & Green 1998)4 is the standard across all sequencing platforms. Quality scores decay across the read due to phasing (Illumina, MGI), homopolymer uncertainties (ONT, Ion), or stochastic noise (PacBio). Tools like FastQC assess per-position quality distributions and flag reads with low-quality tails or unusual adapter contamination. A typical Illumina genome shows median quality Q30–Q35 across the first 100 bp, declining to Q25–Q30 by cycle 150.
BAM/CRAM (Aligned Reads)
After basecalling, reads are mapped to a reference genome and stored in BAM (Binary Alignment/Map) or CRAM (Compressed Reference-oriented Alignment Map) format. BAM stores read sequence, quality, alignment position, and CIGAR strings (compact notation for mismatches, indels, clipping). CRAM achieves ~4× compression by storing only bases that differ from the reference. BAM/CRAM are indexed (.bai or .crai files) for rapid random access to a genomic region.
FAST5 and POD5 (Oxford Nanopore Raw Data)
FAST5 stores raw nanopore ionic current traces (squiggles), basecalled sequences, and metadata. POD5 is a newer, columnar format that is ~10× more space-efficient than FAST5 while preserving all information. Both are instrument-specific and require basecaller software (e.g., Dorado) to convert to FASTQ.
Flow-Space BAM (Ion Torrent)
Ion Torrent instruments output "flow-space" BAM files that store nucleotide flows (dATPs, dTTPs, etc.) and the corresponding detected pH values per well, not per-base sequence. Software like Torrent Suite or FreIbis converts flow-space BAM to conventional BAM via basecalling, allowing downstream analysis with standard variant-calling pipelines.
Input Requirements and Sample Quality
DNA Input
Typical input requirements range from 10 ng (Sanger, minimal volume) to ~1 µg (Illumina, Element, Ultima, PacBio HiFi, ONT ligation). Some platforms tolerate lower inputs with degradation of yield: Element can work with 50 ng (lower yield); ONT PCR-based kits accept <100 ng; Ultima requires ~1 ng but uses PCR amplification (not suitable for unbiased WGS). High-molecular-weight DNA (>10 kb, ideal) improves yield, especially for ONT and PacBio. Fragmented DNA (<2 kb) is acceptable for Illumina, Element, Ultima, and Ion Torrent (amplicon-based) but limits long-read platforms.
DNA quality is measured by spectrophotometry (260/280 and 260/230 ratios; ideally 1.8 and 2.0–2.2 respectively) and gel electrophoresis (checking for degradation). A 260/280 <1.6 indicates protein/phenol contamination; 260/230 <1.5 indicates salt or carbohydrate carryover. Most library prep kits are robust to modest contamination, but salt/carbohydrate can inhibit ligation or amplification enzymes.
RNA Input and Quality
For RNA-seq, total RNA ≥1–2 µg with RIN (RNA Integrity Number) ≥7–8 (RIN=10 = fully intact; RIN<5 = degraded). Illumina, Element, and Ultima RNA-seq kits use poly(A) selection (mRNA-enriched; requires intact 3' poly(A) tail) or random/ribo-depletion (total RNA; does not require poly(A) tail, captures non-poly(A) species). ONT direct RNA sequencing requires 0.5–2 µg poly(A)-selected RNA; the oligo(dT) primer anneals to the poly(A) tail, initiating reverse transcription. ONT direct RNA has dramatically better performance than cDNA-based approaches because RNA modifications are preserved.
Multiplexing and Index Design
Illumina, Element, and MGI support dual indexing (i5 and i7 barcodes). UDI (Unique Dual Indexing) assigns each sample a globally unique i5+i7 pair; any index hopping event produces an invalid pair computationally filtered out. CDI (Combinatorial Dual Indexing) uses a small index set in all pairwise combinations; hopped indexes produce valid but incorrect pairs, requiring conservative filtering. UDI is strongly recommended for high-value or clinical samples. Index-hopping rates on Illumina patterned flow cells (NovaSeq, NextSeq 1000/2000) are ~0.5–2%5; on random flow cells (MiSeq) are ~0.01–0.1%. ONT barcoding (24-plex or 96-plex via PCR) does not have index hopping because there is no on-chip amplification. PacBio does not use indexes; multiplexing is achieved post-run by analyzing the barcode adapter regions.
Minimum Sample Size and Library Complexity
Illumina, Element, Ultima, and MGI benefit from high library complexity (diverse insert size distribution and random fragmentation). Amplicon-based libraries (Ion Torrent AmpliSeq, targeted panels) have lower diversity. For WGS, libraries with >10 million distinct template molecules are ideal; small libraries (<1 million molecules) should use spike-ins (PhiX) to increase measured complexity. ONT and PacBio are not sensitive to library complexity — they sequence whatever is loaded — but low-complexity samples (e.g., single-amplicon PCR products) will produce biased coverage.
Platform Comparison Summary
| Platform | Read Length | Accuracy (Median) | Throughput | Cost per Gb | Key Strength | Key Limitation |
|---|---|---|---|---|---|---|
| Illumina | 50–300 bp | Q30–Q35 | 10–5,000 Gb/run | ~$5–15 | Massive ecosystem, mature | Index hopping (patterned) |
| Element AVITI | 100–300 bp | Q40–Q50+ | 50–1,500 Gb/run | ~$10–20 | Raw data quality, no hopping | Requires circularization |
| Ultima UG 100 | ~250–300 bp (SE, variable) | Q30–Q35 | >10,000 Gb/run | ~$2–5 | Lowest cost/genome WGS | Single-end only, large batch |
| MGI DNBSEQ | 100–150 bp | Q30–Q35 | 240–14,000 Gb/run | ~$5–15 | Cost-competitive, library conversion | Nascent reagent supply (non-China) |
| Sanger | 700–1,000 bp | Q20–Q40 (post-edit) | ~1.2 Mb/day | ~$3–8/read | Gold-standard validation, regulatory | Linear cost scaling, no bulk advantage |
| Oxford Nanopore | 5–100+ kb | Q20–Q25 (simplex), Q30+ (duplex) | 50 Gb–14 Tb/run | ~$20–50 (long-read apps) | Ultra-long reads, mods, portable | Homopolymer indels, pore lifetime |
| PacBio HiFi | 10–25 kb | Q30+ | ~300 Gb/run (Revio) | ~$30–60 (long-read apps) | Length + accuracy combo, methylation | Higher cost, complex preprocessing |
| Ion Torrent | 200–400 bp | Q15–Q25 (homopolymer-dependent) | 500 Mb–15 Gb/run | ~$10–40 | Clinical amplicon panels, fast | Homopolymer errors, niche apps |
| Roche SBX | ~150–350 bp | Q30+ (early data) | TBD (clinical focus) | TBD | Ultra-fast sample-to-answer | Not yet commercial, limited validation |
Platform Trade-offs
No single platform dominates all applications. Illumina and Element excel at standardized short-read applications (WGS, exome, RNA-seq, ChIP-seq); Ultima excels at cost-optimized population-scale WGS. Sanger remains irreplaceable for regulatory validation. Oxford Nanopore and PacBio are complementary: ONT for ultra-long reads and real-time applications, PacBio for the best accuracy at long read length. Ion Torrent remains entrenched in clinical amplicon sequencing. MGI offers cost parity with Illumina in regions where it has supply-chain maturity (China, Southeast Asia), but Western labs lack reagent/service ecosystems. Roche SBX promises to disrupt clinical whole-exome and whole-genome sequencing if it delivers on the speed benchmarks and gains regulatory approval.
Practical Decision Guide
When to Choose Each Platform
Illumina: Broadest application compatibility, largest ecosystem (10x Genomics, Parse Bio, sci-seq, thousands of assay kits). Default choice for RNA-seq, ChIP-seq, ATAC-seq, exome, WGS in labs needing flexibility. Best for projects without specific long-read or modification-profiling needs. Cost-competitive. Watch for index hopping on patterned instruments; use UDI.
Element AVITI: When raw data quality (Q40–Q50+) is the priority and cost is secondary. Excellent for single-cell (10x Genomics compatible), deep sequencing of low-abundance targets, and research where data quality directly impacts discovery. No index hopping. Higher reagent cost than Illumina. Requires library circularization (though Cloudbreak Freestyle automates this). Libraries are Illumina-compatible, enabling reuse of existing prep infrastructure.
Ultima UG 100: Lowest cost per genome for ultra-high-throughput WGS. Purpose-built for population-scale studies, large biobanks, liquid biopsy, and MRD detection. Single-end only; variable read length complicates some variant-calling pipelines. Large batch size (~10 billion reads/wafer) limits flexibility for small projects. Homopolymer indels require adapted callers (GATK/DeepVariant with flow-space models).
MGI DNBSEQ: Cost-competitive alternative to Illumina in regions with mature reagent/service supply (China, Southeast Asia). Library conversion kits enable reuse of Illumina-prepped libraries. Error profiles equivalent to Illumina. Western labs should factor in supply-chain maturity and technical-support availability. Not a strong choice where Illumina supply is reliable and well-established.
Sanger: The gold standard and only acceptable method for regulatory validation of NGS-called pathogenic variants. Confirmatory testing, small-scale sequencing (<50 amplicons), genotyping known mutations, plasmid sequencing. Universally accepted by FDA, EMA, and clinical standards bodies. Not practical for discovery or high-throughput work. Cost scales linearly; no bulk advantage. Cannot detect variants <15–20% frequency.
Oxford Nanopore: Ultra-long reads (structural variant detection, de novo assembly, gap filling), full-length transcript isoforms, native epigenetic modification profiling (5mC, 6mA, RNA modifications), real-time adaptive sampling, rapid pathogen identification, field-deployable sequencing. MinION's portability is unmatched. Choose ONT when read length, modification detection, or time-to-answer matters more than per-base accuracy. Homopolymer accuracy is the primary limitation; simplex Q20–Q25 is lower than other platforms. High coverage (30–50×) or duplex mode is needed for confident variant calls. Pore lifetime (≤72 hours per flow cell) limits per-run output.
PacBio HiFi: Best combination of length and accuracy for de novo genome assembly, diploid phased assemblies, full-length isoform sequencing (Iso-Seq), structural variant detection, CpG methylation calling, resolving repetitive regions. Revio enables population-scale long-read WGS. Higher cost per Gb than short-read platforms. HiFi reads capped at ~25 kb by polymerase processivity. CLR mode raw accuracy (~Q10–Q15) requires specialized assembly algorithms. SMRT Cell loading optimization is critical.
Ion Torrent: Clinical amplicon panels (AmpliSeq, Oncomine, inherited disease panels), high-throughput single-assay settings (Ion Genexus). Homopolymer errors are the dominant limitation; unsuitable for unbiased discovery or applications sensitive to indels. Fast turnaround (Ion Genexus ~6 hours sample-to-results). Not competitive with Illumina for WGS or general-purpose sequencing.
Roche SBX: Ultra-fast clinical WGS (target: sample-to-VCF in <4 hours). If performance targets hold and regulatory approval is achieved, SBX will be compelling for rapid diagnostics, population-scale screening, and any setting where turnaround time and throughput per dollar are paramount. Currently in development or early access only. Independent performance benchmarking is limited. Not suitable for applications requiring long reads or native base-modification detection.
Technical Caveats and Trade-offs
Illumina: Watch for index hopping on patterned flow cells; use UDI. Quality decays toward read ends. Adapter dimer contamination is problematic on ExAmp instruments. Phasing and photobleaching are fundamental limits on read length.
Element AVITI: Requires library circularization (software-automated on Cloudbreak Freestyle but still a preprocessing step). Higher loading concentrations needed. Not yet compatible with every niche library type. Cost per reagent is higher than Illumina despite lower per-Gb cost (due to higher output per run).
Ultima UG 100: Single-end only; applications requiring paired-end information (structural variants, mate-pair scaffolding) are limited. Variable read length (due to wafer mechanics) complicates some pipelines. Large minimum batch size (~10 billion reads/wafer) limits flexibility for small projects. Homopolymer indels are pronounced; specialized variant callers required.
Sanger: Cost scales linearly with number of targets — no batching economy. Cannot detect low-frequency variants (<15–20%). Not practical for anything beyond a few hundred reads. Requires per-target primer design and optimization.
Oxford Nanopore: Homopolymer accuracy remains the primary limitation, especially for indel-sensitive variant calling. Simplex accuracy (Q20–Q25) is lower than other platforms; duplex or high coverage is needed for confident variant calls. Flow cell pore lifetime (≤72 hours) limits per-run output. Systematic context-dependent errors limit consensus accuracy ceiling for some motifs. Basecaller (Dorado) requires GPU compute for rapid turnaround; CPU-only basecalling is slow.
PacBio: HiFi reads capped at ~25 kb by polymerase processivity (must complete multiple passes). Higher cost per Gb than short-read platforms. CLR mode has raw error rates (~15%) and requires specialized assembly algorithms. SMRT Cell loading optimization is critical — underloading wastes ZMWs, overloading produces multi-molecule wells. Consensus accuracy is limited for highly repetitive regions where all passes are identical (e.g., identical tandem repeats).
MGI DNBSEQ: Nascent supply chains for reagents and support outside China. Patent settlement with Illumina creates risk of future licensing restrictions or performance changes. Cultural/technical documentation is occasionally less comprehensive than Illumina equivalents.
Ion Torrent: Homopolymer-associated indels are pronounced (~1.5 errors/100 bases). High failure rate for applications involving long homopolymers (STRs, triplet repeats). Not competitive with short-read platforms for unbiased sequencing. Niche clinical applications only.
Roche SBX: Not yet commercially available. No epigenetic modification detection. Read lengths shorter than ONT or PacBio limit long-range variant detection. Duplex insert size capped at ~350 bp. Requires 2-hour upstream expansion chemistry. Independent performance validation is still pending.
Bioinformatics Pipeline Overview
Each platform has a primary basecalling and alignment workflow:
Illumina: bcl2fastq (proprietary tool) converts raw BCL (binary base-call) files to FASTQ. DRAGEN (proprietary accelerated pipeline, available on Illumina instruments or via AWS) performs alignment, sorting, duplicate marking, and variant calling in ~30 minutes for a whole human genome (vs. ~4 hours for standard GATK). Open-source alternatives: BWA-MEM2 (alignment), samtools (BAM manipulation, sorting, duplicate marking), GATK HaplotypeCaller (variant calling).
Element AVITI: Cloudbreak (proprietary cloud analysis platform) performs basecalling and alignment; local basecalling via bcl2fastq/DRAGEN compatibility. Alignment identical to Illumina (compatible library format). Variant calling follows standard GATK or DeepVariant pipelines.
Ultima Genomics: CRAM-flow (proprietary basecaller and alignment tool) converts raw instrument data to CRAM. Homopolymer-aware variant calling requires DeepVariant with flow-space models or GATK with custom parameters. Output is standard BAM/VCF.
MGI DNBSEQ: zebra (MGI proprietary basecaller) or DRAGEN-compatible workflows (MGI instruments ship with DRAGEN compatibility). Alignment and variant calling are standard (bwa, GATK, DeepVariant).
Sanger: Finch TV (Applied Biosystems) or open-source tools (Phred/Phrap, Consed) for basecalling and sequence trimming. Output is FASTA/FASTQ. No special alignment or variant-calling considerations; Sanger reads are often assembled de novo or aligned with local tools.
Oxford Nanopore: Dorado (transformer-based deep learning basecaller) converts raw FAST5/POD5 ionic current to FASTQ. Three accuracy models (HAC, SUP, Duplex) available. Alignment to reference with minimap2 (long-read optimized; ~20 seconds per flow cell). Variant calling with clair3 (trained on nanopore error modes) or medaka (consensus polisher for assembly). Structural variant detection with sniffles. Modification detection via Dorado's modification-aware models (outputs BAM tag 'MM' and 'ML').
PacBio: SMRT Link (proprietary analysis platform) performs subreading extraction, CCS (circular consensus) generation (combining passes into HiFi reads), and alignment via pbmm2 (minimap2 variant optimized for PacBio). Alignment produces BAM files with per-base accuracy annotations. Variant calling with DeepVariant or GATK (with PacBio-specific parameter tuning). Iso-Seq analysis (full-length isoform detection) via SMRT Link's dedicated pipeline. Methylation detection via kinetic basecaller output (IPD analysis).
Ion Torrent: Torrent Suite (proprietary) or FreIbis (open-source) for flow-space to FASTQ conversion. Alignment with bwa or bowtie2. Variant calling with Ion-optimized GATK parameters or DeepVariant. Homopolymer-sensitive callers are strongly recommended.
Roche SBX: Clinical-specific pipelines not yet publicly detailed. Expect proprietary analysis software bundled with instruments, similar to other clinical sequencing platforms.
Platform-specific variant callers: HaplotypeCaller (GATK; broad applicability), DeepVariant (machine learning; works across all platforms with platform-specific models), clair3 (nanopore-optimized), medaka (assembly/polishing), sniffles (structural variants). Most modern pipelines use DeepVariant due to superior accuracy across diverse sequencing chemistries.
Single-Cell Sequencing Compatibility
10x Genomics: Compatible with Illumina (primary ecosystem), Element AVITI (library-compatible; tested and recommended), and MGI DNBSEQ (via adapter-conversion kits; less mature support). PacBio and ONT have single-cell protocols (e.g., PacBio's long-read cDNA for Iso-Seq, ONT's cDNA-PCR for long transcripts) but are niche and not widely adopted. Ion Torrent is not compatible (incompatible adapter format). Sanger is not applicable to single-cell (single-cell libraries are not sequenceable via Sanger). Roche SBX support is unknown.
Parse Bio (DNA + protein barcoding): Illumina and Element AVITI compatible (libraries are Illumina-format DNA + protein barcode). MGI compatibility unknown. PacBio, ONT, Ion Torrent, Sanger, Roche not compatible.
sci-seq (sci-DNA-seq, sci-RNA-seq): Open-format indexing scheme compatible with any platform that can read concatenated barcodes. Illumina, Element AVITI, MGI DNBSEQ, and theoretically others can sequence sci-seq libraries (barcode is read as part of standard Read 1). PacBio and ONT can also sequence sci-seq libraries (long read length enables barcode + insert in a single read). Ion Torrent compatibility is platform-version dependent.
Long-read single-cell approaches: PacBio's Iso-Seq (cDNA-PCR amplified; full-length transcript + cell barcode in one read; ~100–1,000 cells per run) and ONT direct RNA (RNA-seq with cell barcode in the sequencing adapter; ~10,000–100,000 cells per flow cell) enable single-cell resolution on long reads. These are niche compared to 10x short-read approaches but invaluable for isoform detection and epigenetic profiling at single-cell resolution.
Multiplexing in single-cell: 10x Genomics can multiplex up to 16 libraries per Illumina flow cell lane (via i7 index). Pool identity is determined post-hoc by analyzing barcode composition. This allows rapid cost amortization but reduces per-library depth; small projects must consider whether lane sharing is acceptable.
Clinical and Regulatory Considerations
Illumina: Largest portfolio of FDA-cleared assays and CE-IVD certified tests. Clinical labs can adopt off-the-shelf assays (exome panels, cancer hotspot panels, inherited disease panels) without developing custom pipelines. High-quality data and mature bioinformatics pipelines (DRAGEN) enable rapid assay implementation. Clinical adoption is near-universal in developed healthcare systems.
Oxford Nanopore: CE-IVD approved for selected research-use-only and some clinical applications (e.g., pathogen identification in outbreak settings). Direct RNA sequencing and native modification detection are unique clinical assets for pathogen surveillance and certain epigenetic biomarker applications. Regulatory pathway for clinical IVD approval is advancing but slower than Illumina. Limited FDA-cleared assays to date.
PacBio: Emerging clinical IVD approval (as of 2025, limited but growing). Strong for structural variant detection and complex region phasing, offering clinical advantages over short-read platforms for certain applications (e.g., carrier phasing, complex rearrangement characterization). Bioinformatics pipelines and interpretation standards are still maturing compared to Illumina.
Ion Torrent: Strong clinical presence in targeted oncology and inherited disease panels. FDA-cleared assays (Oncomine, Ion AmpliSeq clinical panels). Niche but established regulatory pathway. Not used for large-scale unbiased WGS/WES in clinical settings.
Sanger: Long-standing gold standard for pathogenic variant confirmation. Many clinical NGS pipelines still include Sanger follow-up for reported variants, particularly for coding-region calls, though guidelines are shifting to accept orthogonal NGS confirmation in some contexts. Universally accepted in clinical genetics and genomic medicine worldwide.
MGI DNBSEQ: Regulatory approval is more limited in Western markets. Used in clinical settings in China and Southeast Asia with substantial adoption. Western regulatory pathways (FDA 510k, CE-IVD) are advancing but clinical market penetration remains low. Chinese healthcare systems have approved BGI sequencing for newborn screening and prenatal diagnosis.
Roche SBX: Regulatory pathway is in progress (2025). Roche's established clinical diagnostics infrastructure and regulatory relationships may accelerate FDA/CE-IVD approval. However, no clinical assays are yet cleared. Broad/Roche partnership on clinical testing may drive adoption post-approval.
CLIA Compliance: All platforms require proper validation within a CLIA (Clinical Laboratory Improvement Amendments) laboratory setting. Validation includes accuracy assessment, reproducibility testing, and interference studies. Platform-specific CLIA validation is often facilitated by the manufacturer (e.g., Illumina's CLIA-validated bioinformatics templates). Off-the-shelf clinical assays come with pre-generated validation data, accelerating CLIA implementation.
Interpretation and Reporting: Clinical bioinformatics pipelines require accurate variant annotation (SnpEff, VEP), clinical significance assessment (ACMG guidelines, ClinVar), and phenotype-genotype correlation. All major platforms can generate accurate genotypes for SNVs and small indels (1–50 bp); structural variants (>50 bp) and complex rearrangements are better characterized with long-read platforms (PacBio, ONT) or specialized short-read pipelines. CNV detection requires special consideration depending on the platform and library preparation method.
Document generated April 2026. Technical details sourced from manufacturer documentation, peer-reviewed publications, and core facility protocols.
References
- Sanger, F., Nicklen, S. & Coulson, A.R. (1977). DNA sequencing with chain-terminating inhibitors. PNAS, 74(12), 5463–5467. doi:10.1073/pnas.74.12.5463
- Bentley, D.R., Balasubramanian, S., Swerdlow, H.P., Smith, G.P., Milton, J., Brown, C.G., et al. (2008). Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456(7218), 53–59. doi:10.1038/nature07517
- Wetterstrand, K.A. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP). Available at genome.gov/sequencingcostsdata
- Ewing, B. & Green, P. (1998). Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Research, 8(3), 175–185. doi:10.1101/gr.8.3.175
- Illumina. (2021). Index hopping and data quality on patterned flow cells. Technical Note. Available at illumina.com
- Arslan, S., Lubin, D.J., Zhang, K., Bhatt, S., Yeakley, J., Rao, R., et al. (2023). Sequencing by avidity enables high accuracy with low reagent consumption. Nature Biotechnology, 42, 132–138. doi:10.1038/s41587-023-01750-7
- Element Biosciences. AVITI Sequencing System Specifications. Available at elementbiosciences.com
- Almogy, G., Pratt, M., Bhatt, S., Mah, C., Ly, A., Pyle, J., et al. (2022). Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform. bioRxiv Preprint. doi:10.1101/2022.05.29.493900
- Ultima Genomics. Technical Documentation: ppmSeq and UG 100 Platform. Available at ultimagenomics.com
- Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani, B.G., et al. (2010). Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science, 327(5961), 78–81. doi:10.1126/science.1181498
- MGI Tech. DNBSEQ Platform Specifications. Available at en.mgi-tech.com
- Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., et al. (2009). Real-time DNA sequencing from single polymerase molecules. Science, 323(5910), 133–138. doi:10.1126/science.1162986
- Castro-Wallace, S.L., Chiu, C.Y., John, K.K., Stahl, S.E., Rubins, K.H., McIntyre, A.B.R., et al. (2017). Nanopore DNA sequencing and genome assembly on the International Space Station. Scientific Reports, 7, 18022. doi:10.1038/s41598-017-18364-0
- Jain, M., Koren, S., Miga, K.H., Quick, J., Rand, A.C., Sasaki, T.A., et al. (2018). Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology, 36(4), 338–345. doi:10.1038/nbt.4060
- Loman, N.J., Quick, J. & Simpson, J.T. (2015). A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature Methods, 12(8), 733–735. doi:10.1038/nmeth.3444
- Wenger, A.M., Peluso, P., Assefa, D., Cirincione, A., Eldeen, S., Levy, J., et al. (2019). Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology, 37(10), 1155–1162. doi:10.1038/s41587-019-0217-9
- Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A.V., Mikheenko, A., et al. (2022). The complete sequence of a human genome. Science, 376(6588), eabj6987. doi:10.1126/science.abj6987
- PacBio. Revio System and SPRQ Chemistry Specifications. Available at pacb.com
- Rothberg, J.M., Hinz, W., Rearick, T.M., Schultz, J., Mileski, W., Davey, M., et al. (2011). An integrated semiconductor device enabling non-optical genome sequencing. Nature, 475(7356), 348–352. doi:10.1038/nature10242
- Thermo Fisher Scientific. Ion GeneStudio S5 and Ion Torrent Specifications. Available at thermofisher.com
- Kokoris, M., et al. (2025). Sequencing by Expansion (SBX) — a novel, high-throughput single-molecule sequencing technology. bioRxiv Preprint. doi:10.1101/2025.02.19.639056
- Wojcik, M.H., Larkin, K., Cipicchio, M., et al. (2025). Toward same-day genome sequencing in the critical care setting. New England Journal of Medicine. NEJMc2512825