Outside of novel sequencing technologies that emerge every few years, the ability to multiplex samples is the most critical and revolutionary aspect of next-generation sequencing. Multiplexing allows for acute control of throughput, amplifying the value of obtaining just enough data per sample.
To make multiplexing possible, small arbitrary sequences are incorporated into the sequencing adapters attached to all fragments of a particular sample. These sequences, known as barcodes, allow for post-sequencing processing to bin each fragment by its originating sample.
However, even high-fidelity polymerases used during sequencing reads are invariably prone to introducing errors. These errors are especially costly when landing during the barcode read, preventing proper binning and wasting associated sequencing reads. To alleviate this, the knowledge of bitwise error correction was extended to the base-wise language of sequencing.
The overall ability to correct barcode read errors stems from the differentiability between the entire set of barcodes. Differentiability can be called distance, or the number of single position changes that are required for one barcode sequence to become another. For example, the top sequence in the below figure has only one position change from the middle, while the middle has one position change from the bottom. Overall, the top to bottom sequence requires two position changes. This concept, known as the Hamming distance, is what powers barcode error correction and casual codebreaking games like Mastermind.
The greater the minimum distance separation across an entire barcode set, the stronger the differentiability. This in turn governs how many errors can be error-corrected across a barcode subset. Maximum error correction is governed by the following formula:
where d is the minimum distance across the entire set.
How does minimum distance affect generating barcode sets? By increasing the minimum distance across a subset, the overall maximum subset size decreases. One must set requirements so that sufficient barcodes are within a set of desired error correction.
We have expanded previously available barcode sequence sets in both set size and index lengths to accommodate higher levels of error correction. Also, other factors such as colorspace on Illumina instruments have been considered, leaving customers with a minimal amount of effort in selecting the best subsets for low-diversity sequencing runs.
Our new 12 nt barcode set, available with the NEXTFLEX® 16S V1 – V3 Amplicon-Seq Kit, allows for up to two error corrections and has multiple low-diversity pooling options. We will continue to develop new technologies to remain the leader in quality multiplexing options.