Illumina sequencing libraries

Illumina sequencing by synthesis requires special oligonucleotide adapters to be annealed to the purified target DNA in order to initiate sequencing. These adapters consist of three main components: (1) the P5 and P7 sequences that allow the library to bind and generate clusters on the flow cell. (2) The i5 and i7 index sequences (barcodes) which uniquely label the molecules from different samples to allow multiplexing/pooling of multiple samples in a single sequencing run or flow cell lane.(3) The binding sites for the Read 1 and Read 2 sequencing primers which initiate the sequencing process itself. There are a variety of Illumina and third party adapter designs that can be used for Illumina sequencing, with the TruSeq and Nextera adapter systems being the most popular:

TruSeq Dual Index Library:


5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
          Illumina P5               i5            TruSeq Read 1                          TruSeq Read 2                 i7        Illumina P7

Nextera Dual Index Library:


5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-insert-CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
           Illumina P5              i5             Nextera Read 1                                Nextera Read 2        i7         Illumina P7

the "insert" is the commonly used term for the target DNA that is to be sequenced, in the case of metabarcoding libraries this also includes the forward and reverse PCR primers used to amplify the target DNA.

The "N"s in the above diagrams indicate the "indexes", or "barcodes" used to discriminate different samples. These are short 8-10bp sequences (i.e. CTATGTTA) that are unique to each sample. The index at the right hand side is the "i7 index", or "index1", and the index at the left hand side is the "i5 index", or "index2".

Most modern sequencing protocols use dual-indexing rather than single indexing. Dual indexed libraries can either be combinatiorial, where only 1 index is different between samples, while the other is shared:


Sample 1 - -AATAACGT...AATCGTTA
Sample 2 - -TTCTTGAA...GTCTACAT
Sample 3 - -GGCAGATC...CGCTGCTC
Sample 4 - -CTATGTTA...GATCAACA
Sample 5 - -GTTGACGC...CGAAGGAC
               i5        i7

Or completely unique (Unique Dual Indexing) where both the i5 and i7 index is completely unique to that sample:


Sample 1 - -AATAACGT...AATCGTTA
Sample 2 - -TTCTTGAA...GTCTACAT
Sample 3 - -GGCAGATC...CGCTGCTC
Sample 4 - -CTATGTTA...GATCAACA
Sample 5 - -GTTGACGC...CGAAGGAC
               i5        i7

Illumina now encourages customers to use unique dual indexing (UDI) whenever possible to ensure the most accurate demultiplexing, and therefore reduce the risk of sample cross-contamination.

Library preparation:

In our current metabarcoding protocol, we are using the TruSeq adapter system and anneal them to the molecule using 2 separate PCRs:

First PCR:

The first PCR amplifies the target DNA and adds the illumina Read 1 primer on the left side of the insert, and the Read 2 primer on right side of insert. To achieve this we need to modify our locus-specific primers to include the Universal 5' adapters as tails. In the below example we are using the fwhF2-fwhR2n primer sets which amplify a short region of the mitochondrial COI barcode:


Tailed F primer: 5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT-GGDACWGGWTGAACWGTWTAYCCHCC -3'
                                TruSeq Read 1                    Forward primer 
Tailed R primer: 5'- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-GTRATWGCHCCDGCTARWACWGG -3'
                                TruSeq Read 2                    Reverse primer

Following amplification with these tailed primers, the molecules will look like this:


5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCTGGDACWGGWTGAACWGTWTAYCCHCC-Target-CCWGTWYTAGCHGGDGCWATYACAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -3'
5'- TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGACCHCCYATWTGWCAAGTWGGWCADGG-Target-GGWCAWRATCGDCCHCGWTARTGTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG -3'
            TruSeq Read 1                Forward primer                          Reverse primer               TruSeq Read 2

Note that in some sequencing protocols such as those used for whole-genome, metagenomic, or metatranscriptomics, the Read 1 and Read 2 adapters are annealed to the molecules using alternatives to PCR such as tagmentation or ligation.

Second PCR:

The second PCR uses the Read 1 and Read 2 primer sequences as templates to add the P5 and P7 sequencing primers, as well as the i5 and i7 indexes. The second set of primers, commonly referred to as indexing primers, are normally purchased in a kit or designed in-house. Either way, they are generally structured as follows:


iTru_R1_5: 5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'
                        Illumina P5               i5            TruSeq Read 1
iTru_R2_5: 5'- AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                        TruSeq Read 2                 i7        Illumina P7

Following amplification with the second set of primers (indexing primers), the molecules will look like this:


5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCTGGDACWGGWTGAACWGTWTAYCCHCC-Target-CCWGTWYTAGCHGGDGCWATYACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGACCHCCYATWTGWCAAGTWGGWCADGG-Target-GGWCAWRATCGDCCHCGWTARTGTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
          Illumina P5               i5            TruSeq Read 1               Forward primer                         Reverse primer             TruSeq Read 2                 i7        Illumina P7

Once the adapters are added the libraries are ready to be sequenced.

Sequencing:

The below steps are automatically performed by the machine and sequencing chemistry, and do not need to be performed by the operator. From here on the target DNA, forward, and reverse primers will be referred to as the "Insert".

In the sequencing reagents provided by Illumina, the sequencing primers are actually a mixture of different primers, including TruSeq, Nextera and even primers from obsolete kits. Therefore, you actually can sequence different types of libraries together.

(Step 1) Add Read 1 sequencing primer mixture to sequence the first read (bottom strand as template):


TruSeq Dual Index Library:

                                     5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT---->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Nextera Dual Index Library:

                                     5'- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG------>
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'

(Step 2) Add Index 1 sequencing primer mixture to sequence the first index (index 1, i7, bottom strand as template):


TruSeq Dual Index Library:

                                                                              5'- AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Nextera Dual Index Library:

                                                                              5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'

(Step 3 of MiSeq, HiSeq2000/2500 and NovaSeq 6000) Folds over and sequence the second index (index 2, i5, bottom strand as template):


TruSeq Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Nextera Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'

(Step 3 of iSeq 100, MiniSeq, NextSeq, HiSeq X and HiSeq 3000/4000) Add Index 2 sequencing primer mixture to sequence the second index (index 2, i5, top strand as template):


TruSeq Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                 <-------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'


Nextera Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-insert-CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                 <-------AGCAGCCGTCGCAGTCTACACATATTCTCTGTC -5'

(Step 4) Cluster regeneration, add Read 2 sequencing primer mixture to sequence the second read (top strand as template):


TruSeq Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                                                           <------TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG -5'


Nextera Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-insert-CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                                                           <------GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'