Fragmentation of nucleic acids prior to library construction is required by the majority of next-generation sequencing platforms. Available methods for fragmenting high quality genomic DNA vary in their ability to focus sheared nucleic acids to a tight average fragment size. Broadly distributed shearing profiles are obtained when using probe-based shearing instruments such as QSonica’s sonicator, whereas ultrasonication instruments, such as Diagenode’s Biorupter®ultrasonicators or any instrument from the product line offered by Covaris, result in much more controlled, tighter shearing profiles. In recent years, enzymatic fragmentation modules have become coupled with downstream library prep to offer a more convenient, automation-friendly offering for labs needing high-throughput solutions. Highly variable shearing profiles of starting material and limited sequencing read lengths leave the researcher with an important question of whether or not to size select NGS libraries.
Read Length Considerations
Read lengths play an important role in determining if size selecting NGS libraries is necessary. If starting with a broad shear profile (100 – 1,500 bp) and performing 2×150 reads, it would be advisable to size select 300 – 400 bp or 350 – 500 bp, post-ligation. This strategy would ensure maximum coverage of most inserts. If size selection is not employed in this scenario, many higher molecular weight molecules will not be sequenced deeply, resulting in non-uniform genome coverage.
Starting Material: Low Quality DNA
Formalin-fixed, paraffin-embedded (FFPE) nucleic acids can be highly degraded and fragmented, a consequence of the nature of preservation. If starting with sub-nanogram quantities of low quality DNA, size selection would not be advised due to the limited number of amplifiable DNA molecules going into PCR, which could result in greatly reduced library yield.
Starting Material: Sufficient Quantity of High Quality DNA
Size selecting a specific region of a broad range shear is advisable if starting with ≥ 10 ng of DNA. If DNA is not a limiting factor and many barcoded samples are being processed in parallel, size selection is highly recommended. If size selection is not performed in this scenario and barcoded libraries are pooled and loaded onto the flow cell for cluster generation, samples containing lower molecular weight DNA will be preferentially amplified via clustering, increasing the number of reads this sample receives and decreasing the number of reads received by other libraries. The same is true for removal of high molecular weight inserts. This is an internal control to ensure each library gets similar reads or coverage.
Size Selecting Peak of Shear
If starting with a Gaussian shear profile ranging from 150 – 600 bp, size selection from 300 – 400 bp would be recommended. This method of selection would produce the highest yielding libraries because this area harbors the largest number of viable sequencing molecules, and thus more successful ligation events and higher yield libraries. Conversely, if selecting outside the peak of the shear, library yields will be lower due to a decrease in the number of viable molecules.