Illumina sequencing by synthesis requires special oligonucleotide adapters to be annealed to the purified target DNA in order to initiate sequencing. These adapters consist of three main components: (1) the P5 and P7 sequences that allow the library to bind and generate clusters on the flow cell. (2) The i5 and i7 index sequences (barcodes) which uniquely label the molecules from different samples to allow multiplexing/pooling of multiple samples in a single sequencing run or flow cell lane.(3) The binding sites for the Read 1 and Read 2 sequencing primers which initiate the sequencing process itself. There are a variety of Illumina and third party adapter designs that can be used for Illumina sequencing, with the TruSeq and Nextera adapter systems being the most popular:
5'- AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN ACACTCTTTCCCTACACGACGCTCTTCCGATCT -insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3' 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'Illumina P5 i5 TruSeq Read 1 TruSeq Read 2 i7 Illumina P7
5'- AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN TCGTCGGCAGCGTC AGATGTGTATAAGAGACAG -insert-CTGTCTCTTATACACATCT CCGAGCCCACGAGAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3' 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN AGCAGCCGTCGCAG TCTACACATATTCTCTGTC -insert-GACAGAGAATATGTGTAGA GGCTCGGGTGCTCTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'Illumina P5 i5 Next era Read 1 Next era Read 2 i7 Illumina P7
the "insert" is the commonly used term for the target DNA that is to be sequenced, in the case of metabarcoding libraries this also includes the forward and reverse PCR primers used to amplify the target DNA.
The "N"s in the above diagrams indicate the "indexes", or "barcodes" used to discriminate different samples. These are short 8-10bp sequences (i.e. CTATGTTA) that are unique to each sample. The index at the right hand side is the "i7 index", or "index1", and the index at the left hand side is the "i5 index", or "index2".
Most modern sequencing protocols use dual-indexing rather than single indexing. Dual indexed libraries can either be combinatiorial, where only 1 index is different between samples, while the other is shared:
Sample 1 - -AATAACGT ...AATCGTTA Sample 2 --TTCTTGAA ...GTCTACAT Sample 3 --GGCAGATC ...CGCTGCTC Sample 4 --CTATGTTA ...GATCAACA Sample 5 --GTTGACGC ...CGAAGGAC i5 i7
Or completely unique (Unique Dual Indexing) where both the i5 and i7 index is completely unique to that sample:
Sample 1 - -AATAACGT ...AATCGTTA Sample 2 --TTCTTGAA ...GTCTACAT Sample 3 --GGCAGATC ...CGCTGCTC Sample 4 --CTATGTTA ...GATCAACA Sample 5 --GTTGACGC ...CGAAGGAC i5 i7
Illumina now encourages customers to use unique dual indexing (UDI) whenever possible to ensure the most accurate demultiplexing, and therefore reduce the risk of sample cross-contamination.
In our current metabarcoding protocol, we are using the TruSeq adapter system and anneal them to the molecule using 2 separate PCRs:
The first PCR amplifies the target DNA and adds the illumina Read 1 primer on the left side of the insert, and the Read 2 primer on right side of insert. To achieve this we need to modify our locus-specific primers to include the Universal 5' adapters as tails. In the below example we are using the fwhF2-fwhR2n primer sets which amplify a short region of the mitochondrial COI barcode:
Tailed F primer: 5'-
ACACTCTTTCCCTACACGACGCTCTTCCGATCT -GGDACWGGWTGAACWGTWTAYCCHCC -3'TruSeq Read 1 Forward primer Tailed R primer: 5'-
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT -GTRATWGCHCCDGCTARWACWGG -3'TruSeq Read 2 Reverse primer
Following amplification with these tailed primers, the molecules will look like this:
5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT GGDACWGGWTGAACWGTWTAYCCHCC -Target-CCWGTWYTAGCHGGDGCWATYAC AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -3' 5'-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA CCHCCYATWTGWCAAGTWGGWCADGG -Target-GGWCAWRATCGDCCHCGWTARTG TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG -3'TruSeq Read 1 Forward primer Reverse primer TruSeq Read 2
Note that in some sequencing protocols such as those used for whole-genome, metagenomic, or metatranscriptomics, the Read 1 and Read 2 adapters are annealed to the molecules using alternatives to PCR such as tagmentation or ligation.
The second PCR uses the Read 1 and Read 2 primer sequences as templates to add the P5 and P7 sequencing primers, as well as the i5 and i7 indexes. The second set of primers, commonly referred to as indexing primers, are normally purchased in a kit or designed in-house. Either way, they are generally structured as follows:
iTru_R1_5: 5'-
AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'Illumina P5 i5 TruSeq Read 1 iTru_R2_5: 5'-
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3'TruSeq Read 2 i7 Illumina P7
Following amplification with the second set of primers (indexing primers), the molecules will look like this:
5'- AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN ACACTCTTTCCCTACACGACGCTCTTCCGATCT GGDACWGGWTGAACWGTWTAYCCHCC -Target-CCWGTWYTAGCHGGDGCWATYAC AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3' 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA CCHCCYATWTGWCAAGTWGGWCADGG -Target-GGWCAWRATCGDCCHCGWTARTG TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'Illumina P5 i5 TruSeq Read 1 Forward primer Reverse primer TruSeq Read 2 i7 Illumina P7
Once the adapters are added the libraries are ready to be sequenced.
The below steps are automatically performed by the machine and sequencing chemistry, and do not need to be performed by the operator. From here on the target DNA, forward, and reverse primers will be referred to as the "Insert".
In the sequencing reagents provided by Illumina, the sequencing primers are actually a mixture of different primers, including TruSeq, Nextera and even primers from obsolete kits. Therefore, you actually can sequence different types of libraries together.
TruSeq Dual Index Library: 5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT ----> 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'
Nextera Dual Index Library: 5'- TCGTCGGCAGCGTC AGATGTGTATAAGAGACAG ------> 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN AGCAGCCGTCGCAG TCTACACATATTCTCTGTC -insert-GACAGAGAATATGTGTAGA GGCTCGGGTGCTCTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'
TruSeq Dual Index Library: 5'- AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -------> 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'
Nextera Dual Index Library: 5'- CTGTCTCTTATACACATCT CCGAGCCCACGAGAC -------> 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN AGCAGCCGTCGCAG TCTACACATATTCTCTGTC -insert-GACAGAGAATATGTGTAGA GGCTCGGGTGCTCTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'
TruSeq Dual Index Library: 5'- AATGATACGGCGACCACCGAGATCTACAC -------> 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'
Nextera Dual Index Library: 5'- AATGATACGGCGACCACCGAGATCTACAC -------> 3'-TTACTATGCCGCTGGTGGCTCTAGATGTG NNNNNNNN AGCAGCCGTCGCAG TCTACACATATTCTCTGTC -insert-GACAGAGAATATGTGTAGA GGCTCGGGTGCTCTG NNNNNNNN TAGAGCATACGGCAGAAGACGAAC -5'
TruSeq Dual Index Library: 5'- AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN ACACTCTTTCCCTACACGACGCTCTTCCGATCT -insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3' <-------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'
Nextera Dual Index Library: 5'- AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN TCGTCGGCAGCGTC AGATGTGTATAAGAGACAG -insert-CTGTCTCTTATACACATCT CCGAGCCCACGAGAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3' <-------AGCAGCCGTCGCAG TCTACACATATTCTCTGTC -5'
TruSeq Dual Index Library: 5'- AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN ACACTCTTTCCCTACACGACGCTCTTCCGATCT -insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3' <------TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG -5'
Nextera Dual Index Library: 5'- AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN TCGTCGGCAGCGTC AGATGTGTATAAGAGACAG -insert-CTGTCTCTTATACACATCT CCGAGCCCACGAGAC NNNNNNNN ATCTCGTATGCCGTCTTCTGCTTG -3' <------GACAGAGAATATGTGTAGA GGCTCGGGTGCTCTG -5'