DNA Barcoding is a method to identify species using a short fragment of their DNA. The method relies on the comparison of this DNA sequence to those included in a reference database.
The DNA fragment sequenced must be standardized. The gene region chosen as the standard barcode for most animal groups is a fragment of the mitochondrial cytochrome c oxidase subunit 1 gene (“COI”). This fragment exhibits an interesting level of variability. Globally, individuals of the same species exhibit low level of genetic differences whereas genetic differences between individuals belonging to different species are significantly higher. This characteristic is used to discriminate COI sequences between species and COI is consequently used as a barcode to recognize the species.
COI sequences of specimens previously identified by expert taxonomists are stored in the reference database. Sequenced specimens are not destroyed during the extraction process and are subsequently preserved in our collection as vouchers. These vouchers are used as quality control to ascertain the first identification, to identify potential mistakes occurring throughout the sequencing process, to verify sequence mismatches detected between conspecific individuals. In our database, DNA sequences obtained from a single individual as well as species names are clearly linked to a voucher specimen from which DNA has been extracted. This quality control is essential, indeed including a sequence and a species name in a database without a voucher specimen is not considered a scientifically sound policy.
The developed database allows users to identify an unknown specimen by comparing its COI sequence to our reference matrix of sequences that includes the sequences from previously identified species. Assigning an unknown sequence to a known one is not without pitfalls. Unless the sequence to be assigned is strictly identical to known sequences within the database, an inference must be made (but see “Limitation of DNA barcoding” hereunder). When the sequence to be assigned is different, it is therefore difficult to discern whether or not differences are due to intraspecific variation or if they reflect interspecific differences. Furthermore, it is unlikely that genetic distance reflecting interspecific differences will be uniform across different taxonomic groups at the barcode locus. Nevertheless, species delineation relies mostly on the use of a standard threshold, set to differentiate between intraspecific variation and interspecific divergence. This threshold, the so-called "barcoding gap", was defined as 10 times the mean intraspecific variation for the group under study. While several studies suggest that a wide gap between intra- and interspecific variation makes a threshold approach useful, some other studies show that the overlap is greater when a larger proportion of closely related taxa are included, making the method problematic. Despite of this, a 3% threshold has been frequently cited as a sufficient genetic disparity to characterize different species. This is true in several insect groups but there are notable exceptions (See here under).
Provide expert knowledge to a large number of users. Morphological identification of animal species needs strong skills, long learning and strong experience. Furthermore, the limited number of experienced taxonomists does not allow answering the overwhelming number of identification requests within a short time lapse. Barcoding makes taxonomist knowledge and experience available to all people who master sequencing. Consequently, taxonomists can focus on their fundamental tasks: species description, species delimitation and taxonomic revision.
Open access to universal knowledge thanks to standardization. The universal use of a low number of genetic markers (COI, ITS for animals) and of standardized protocols enables users to identify a species without any prior taxonomic knowledge.
Reliable identification of all developmental stages: Insects can be sampled as adults, larvae, eggs or fragments within a gut etc. Most preimaginal stages of arthropods are extremely difficult to identify morphologically. However, these stages are often sampled in the field, intercepted at borders or only parts of arthropods are available after trapping or sampling. Sequencing COI is generic and enables the identification of any parts, stages of an arthropod when morphological identification is impossible.
The need for database completeness, high number of species and populations sampled.
The accuracy of a molecular identification relies mostly on the completeness of the reference library used. For insect groups that have been extensively studied (for which most if not all species have been described and barcoded), the barcode database enables a majority of individuals (>95%) to be successfully identified by COI. However, even in such extensively studied taxa, a relatively high percentage of species (up to 10%) will not be discernable because of ancestral polymorphism. DNA barcoding could be less effective for identification in poorly known groups (poor taxonomy and poor completeness of the barcoding database). In such groups, which represent most insect groups, many species will appear to be genetically non-monophyletic and identification may frequently be erroneous.
Thus, accurateness of the molecular identification is strongly linked to database completeness, and of course to the quality and the density of the data.
COI variability is not always consistent with infra and supra specific differentiation.
As previously explained, DNA barcoding relies mostly on the assumption that infraspecific genetic differentiation is lower that differentiation between species. However, several studies have shown that this is not always true, and that different biological and morphological species could share the same COI haplotype or, to the opposite, the same morphospecies could exhibit strongly divergent haplotypes. In other cases intraspecific variation overlaps with interspecific divergence and gives rise to genetically polyphyletic or paraphyletic species. When such overlap occurs, COI marker cannot reliably identify species. On the opposite, high mtDNA sequence distances, between individuals from allopatric populations, are frequently taken as direct indicators for species differences. In a few cases, uncorrected p genetic distances within species reach up to 9% (some Orthoptera). Such strong intraspecific distances are always interpreted as evidence of morphologically cryptic species but are in fact due to long-term demographic stability of the populations, which accumulated mutations without any demographic variation. In all cases, relying on multi-marker for DNA barcoding always improved the performance of DNA-based identification.
Identification using COI barcodes can be hampered by pseudogenes
Nuclear mitochondrial pseudogenes (numts) are nonfunctional copies of mtDNA integrated in the nucleus. MtDNA integrations into the nuclear genome can occur several times independently. Numts have been found in numerous clades of eukaryotes and are frequent in some insect groups (i.e. Coleoptera Cerambycidae, Orthoptera Acrididae etc). Numts are lineage specific and their presence cannot be predicted. These copies can be amplified simultaneously with orthologous mtDNA by using conserved universal primers. Despite numts are frequently considered a minor limitation to barcoding, they can hamper DNA identification in some taxa. Coamplified numts can be highly divergent (>3%) and can lead to an artificially overestimate of the inferred number of unique species.
Heteroplasmy can be troublesome
Heteroplasmy is the existence of multiple mtDNA haplotypes in a single individual. Heteroplasmy can result from somatic mutations, double uniparental inheritance, paternal leakage and hybridization. Many cases of heteroplasmy are reported in insects (Hymenoptera, Orthoptera, Phthiraptera) and probably more are to be discovered. Heteroplasmic mitochondria can exhibit relatively high genetic differences (up to 5%) and the dominant haplotype can differ significantly between tissues (leg or thoracic muscles, abdomen). Heteroplasmy is poorly studied and more difficult to detect than numts, because multiple haplotypes could remain functional and lack any stop codons or frameshift mutations. However, until now, heteroplasmy does not appear to seriously impact the accuracy of DNA identification.
Mitochondria exchanges between species
The accuracy of mtDNA identification is also compromised by mtDNA introgression (the mitochondrial genome of one species is replaced by that of another species) resulting from interspecific hybridization. MtDNA introgression is frequent in numerous insect groups (Diptera Drosophilidae and Culicidae ; Coleoptera Carabidae etc). In many cases the introgressed genomes can be fixed in some parts of the distribution range of the recipient species, in some other cases two morphologically and genetically divergent species share a single mtDNA type. This phenomena can be amplified through the presence of Wolbachia, another maternally heritated symbiont. Several studies reported the dynamic coupling of Wolbachia and mtDNA that causes a selective sweep of mtDNA by reducing sequence diversity.
There are several other mechanisms that lead to mismatch between mtDNA and named species, leading to species-level polyphyly and to barcode identification failure. Therefore, be careful when interpreting the results of a BLAST or tree inference analyses; don’t forget to take a look to the literature on the group you are focusing on.
The French Ministry of Agriculture, Nature and Food Quality has carefully compiled the information on this website. The Ministry however is not responsible for the correctness and completeness of the information supplied. The Ministry cannot be held liable for any damage that occur as a result of the use of this information. In addition, you cannot claim any rights in relation to this information. The Ministry is not responsible for the content of externally-linked web pages.
FEEDBACK: Send a mail to help us improve the database by letting us know if you come across an error (misidentification of specimens, mislabelling of photos ..).
COPYRIGHT: Coeur d'acier et al. (2014). DNA barcoding and the Associated PhylAphidB@se Website for the identification of European Aphids (Insecta: Hemiptera: Aphididae). Plos One, 9 (6) to use the database in a publication. DOI:10.1371/journal.pone.0097620
Website built using BioloMICS Software.
If you would like to know more about cookies and how they work, please visit www.allaboutcookies.org.
1. Table-columns-strains_2: contains the list of columns that must be displayed (when changed by the end-user) when searching Strains_2 table views (this is there to keep the preferences of the end-users; It will not be present if the end-user has not changed this option).
2. Queries-layout-strains_2: contains the list of queries that have been done by the end-user when searching Strains_2 table views (this is there to keep the preferences of the end-users; It will not be present if the end-user has not changed this option).
3. Table-columns- strains_3: contains the list of columns that must be displayed (when changed by the end-user) when searching strains_3 table views (this is there to keep the preferences of the end-users; It will not be present if the end-user has not changed this option).
4. Queries-layout- strains_3: contains the list of queries that have been done by the end-user when searching strains_3 table views (this is there to keep the preferences of the end-users; It will not be present if the end-user has not changed this option).
5. Table-columns-Open%20collection: contains the list of columns that must be displayed (when changed by the end-user) when searching 20collection table views (this is there to keep the preferences of the end-users; It will not be present if the end-user has not changed this option).
6. Queries-layout- 20collection: contains the list of queries that have been done by the end-user when searching 20collection table views (this is there to keep the preferences of the end-users; It will not be present if the end-user has not changed this option).
7. List-display: keeps the end-user preference in terms of display format (either results in grid or results looking like a Google format) (this is there to keep the preferences of the end-users; It will not be present if the end-user has not changed this option).
8. SearchState: this keeps the information about the last query and the page number where the end-user was the last time he/she did a query.
9. ASP.NET_SessionId: this is an automatic cookie that keeps the unique session ID number to be used on the server side. This is deleted when session is finished/expired.
10. last-query-layout-Open%20collection and similar, contain the last query done by the end-user on the Open%20collection table view. This is used when first reloading the page. It is replaced each time there is a query done.
11. _utma, _utmb, _utmc, _utmd, etc are Google analytics cookies to analyze web traffic (see https://helpful.knobs-dials.com/index.php/Utma,_utmb,_utmz_cookies).
Cookies mentioned in the last point are Google analytics cookies that are IP anonymized which means that we cannot trace single users. See below for more information.
No other cookies than the ones mentioned above are used on our websites.
Google cookies and technologies
Google Analytics: These cookies allow us to see information on user website activities including, but not limited to page views, source and time spent on a website. The information is depersonalized and is displayed as numbers, meaning it cannot be traced back to individuals. This will help to protect your privacy. Using Google Analytics, we can see what content is popular on our websites.
You can prevent the information generated by the Google cookie about your use of our Sites from being collected and processed by Google in the future by downloading and installing Google Analytics Opt-out Browser Add-on for your current web browser. This Add-on is available at http://tools.google.com/dlpage/gaoptout.