PHYLAPHID B@se

DNA barcoding

  • What is DNA barcoding ?
  • The usefulness of DNA Barcoding
  • Limitation of DNA barcoding

  • What is DNA barcoding ?

    DNA Barcoding is a method to identify species using a short fragment of their DNA. The method relies on the comparison of this DNA sequence to those included in a reference database.

    The DNA fragment sequenced must be standardized. The gene region chosen as the standard barcode for most animal groups is a fragment of the mitochondrial cytochrome c oxidase subunit 1 gene (“COI”). This fragment exhibits an interesting level of variability. Globally, individuals of the same species exhibit low level of genetic differences whereas genetic differences between individuals belonging to different species are significantly higher. This characteristic is used to discriminate COI sequences between species and COI is consequently used as a barcode to recognize the species.

    COI sequences of specimens previously identified by expert taxonomists are stored in the reference database. Sequenced specimens are not destroyed during the extraction process and are subsequently preserved in our collection as vouchers. These vouchers are used as quality control to ascertain the first identification, to identify potential mistakes occurring throughout the sequencing process, to verify sequence mismatches detected between conspecific individuals. In our database, DNA sequences obtained from a single individual as well as species names are clearly linked to a voucher specimen from which DNA has been extracted. This quality control is essential, indeed including a sequence and a species name in a database without a voucher specimen is not considered a scientifically sound policy.

    The developed database allows users to identify an unknown specimen by comparing its COI sequence to our reference matrix of sequences that includes the sequences from previously identified species. Assigning an unknown sequence to a known one is not without pitfalls. Unless the sequence to be assigned is strictly identical to known sequences within the database, an inference must be made (but see “Limitation of DNA barcoding” hereunder). When the sequence to be assigned is different, it is therefore difficult to discern whether or not differences are due to intraspecific variation or if they reflect interspecific differences. Furthermore, it is unlikely that genetic distance reflecting interspecific differences will be uniform across different taxonomic groups at the barcode locus. Nevertheless, species delineation relies mostly on the use of a standard threshold, set to differentiate between intraspecific variation and interspecific divergence. This threshold, the so-called "barcoding gap", was defined as 10 times the mean intraspecific variation for the group under study. While several studies suggest that a wide gap between intra- and interspecific variation makes a threshold approach useful, some other studies show that the overlap is greater when a larger proportion of closely related taxa are included, making the method problematic. Despite of this, a 3% threshold has been frequently cited as a sufficient genetic disparity to characterize different species. This is true in several insect groups but there are notable exceptions (See here under).


    The usefulness of DNA Barcoding

    Provide expert knowledge to a large number of users. Morphological identification of animal species needs strong skills, long learning and strong experience. Furthermore, the limited number of experienced taxonomists does not allow answering the overwhelming number of identification requests within a short time lapse. Barcoding makes taxonomist knowledge and experience available to all people who master sequencing. Consequently, taxonomists can focus on their fundamental tasks: species description, species delimitation and taxonomic revision.

    Open access to universal knowledge thanks to standardization. The universal use of a low number of genetic markers (COI, ITS for animals) and of standardized protocols enables users to identify a species without any prior taxonomic knowledge.

    Reliable identification of all developmental stages: Insects can be sampled as adults, larvae, eggs or fragments within a gut etc. Most preimaginal stages of arthropods are extremely difficult to identify morphologically. However, these stages are often sampled in the field, intercepted at borders or only parts of arthropods are available after trapping or sampling. Sequencing COI is generic and enables the identification of any parts, stages of an arthropod when morphological identification is impossible.


    Limitation of DNA barcoding

    The need for database completeness, high number of species and populations sampled.

    The accuracy of a molecular identification relies mostly on the completeness of the reference library used. For insect groups that have been extensively studied (for which most if not all species have been described and barcoded), the barcode database enables a majority of individuals (>95%) to be successfully identified by COI. However, even in such extensively studied taxa, a relatively high percentage of species (up to 10%) will not be discernable because of ancestral polymorphism. DNA barcoding could be less effective for identification in poorly known groups (poor taxonomy and poor completeness of the barcoding database). In such groups, which represent most insect groups, many species will appear to be genetically non-monophyletic and identification may frequently be erroneous.

    Thus, accurateness of the molecular identification is strongly linked to database completeness, and of course to the quality and the density of the data.

    COI variability is not always consistent with infra and supra specific differentiation.

    As previously explained, DNA barcoding relies mostly on the assumption that infraspecific genetic differentiation is lower that differentiation between species. However, several studies have shown that this is not always true, and that different biological and morphological species could share the same COI haplotype or, to the opposite, the same morphospecies could exhibit strongly divergent haplotypes. In other cases intraspecific variation overlaps with interspecific divergence and gives rise to genetically polyphyletic or paraphyletic species. When such overlap occurs, COI marker cannot reliably identify species. On the opposite, high mtDNA sequence distances, between individuals from allopatric populations, are frequently taken as direct indicators for species differences. In a few cases, uncorrected p genetic distances within species reach up to 9% (some Orthoptera). Such strong intraspecific distances are always interpreted as evidence of morphologically cryptic species but are in fact due to long-term demographic stability of the populations, which accumulated mutations without any demographic variation. In all cases, relying on multi-marker for DNA barcoding always improved the performance of DNA-based identification.

    Identification using COI barcodes can be hampered by pseudogenes

    Nuclear mitochondrial pseudogenes (numts) are nonfunctional copies of mtDNA integrated in the nucleus. MtDNA integrations into the nuclear genome can occur several times independently. Numts have been found in numerous clades of eukaryotes and are frequent in some insect groups (i.e. Coleoptera Cerambycidae, Orthoptera Acrididae etc). Numts are lineage specific and their presence cannot be predicted. These copies can be amplified simultaneously with orthologous mtDNA by using conserved universal primers. Despite numts are frequently considered a minor limitation to barcoding, they can hamper DNA identification in some taxa. Coamplified numts can be highly divergent (>3%) and can lead to an artificially overestimate of the inferred number of unique species.

    Heteroplasmy can be troublesome

    Heteroplasmy is the existence of multiple mtDNA haplotypes in a single individual. Heteroplasmy can result from somatic mutations, double uniparental inheritance, paternal leakage and hybridization. Many cases of heteroplasmy are reported in insects (Hymenoptera, Orthoptera, Phthiraptera) and probably more are to be discovered. Heteroplasmic mitochondria can exhibit relatively high genetic differences (up to 5%) and the dominant haplotype can differ significantly between tissues (leg or thoracic muscles, abdomen). Heteroplasmy is poorly studied and more difficult to detect than numts, because multiple haplotypes could remain functional and lack any stop codons or frameshift mutations. However, until now, heteroplasmy does not appear to seriously impact the accuracy of DNA identification.

    Mitochondria exchanges between species

    The accuracy of mtDNA identification is also compromised by mtDNA introgression (the mitochondrial genome of one species is replaced by that of another species) resulting from interspecific hybridization. MtDNA introgression is frequent in numerous insect groups (Diptera Drosophilidae and Culicidae ; Coleoptera Carabidae etc). In many cases the introgressed genomes can be fixed in some parts of the distribution range of the recipient species, in some other cases two morphologically and genetically divergent species share a single mtDNA type. This phenomena can be amplified through the presence of Wolbachia, another maternally heritated symbiont. Several studies reported the dynamic coupling of Wolbachia and mtDNA that causes a selective sweep of mtDNA by reducing sequence diversity.

    There are several other mechanisms that lead to mismatch between mtDNA and named species, leading to species-level polyphyly and to barcode identification failure. Therefore, be careful when interpreting the results of a BLAST or tree inference analyses; don’t forget to take a look to the literature on the group you are focusing on.