While the identities of ingredients in their whole forms, such as an apple (Malus domestica, Rosaceae) or a bulb of garlic (Allium sativum, Amaryllidaceae), often can be determined by physical appearance (macroscopic characteristics) alone, it is more difficult to discern the composition of processed foods, such as apple juice or garlic powder. The same is true for dietary supplement ingredients.
With the dizzying array of product choices, widespread reports of fraud, and the general difficulty of conveying high-quality ingredients on a product label, making informed purchasing decisions is no simple matter for consumers. When choosing dietary supplements for individual health needs, consumers often rely on products that are authentically produced and properly labeled. Regulations to ensure dietary supplement product integrity within the United States are generally appropriate, but concerns about ingredient safety and purity persist. DNA-based authentication tests, which continue to become less expensive, faster, and more accurate, are powerful tools to ensure food and natural product safety and consumer confidence.
The use of DNA data for the identification of natural products has been available for some time, with the earliest known report of DNA-based botanical authentication published by Shaw and But in 1995.1 However, the prospect of its widespread application in commerce for quality control and as part of regulatory programs is new. To that end, understanding how DNA testing methods work and their relative strengths and weaknesses can help industry make informed decisions about the application and suitability of using DNA for product screening.
This article seeks to introduce the basic principles of DNA testing, provide a broad outline of different types of DNA tests, and suggest how DNA testing may work with other existing types of authentication processes to ensure the safety and authenticity of herbal dietary supplements. The purpose of the article is not to describe and compare all types of DNA tests but to help stimulate thinking about the use of these methodologies, highlight some of their strengths and limitations, and clarify some misunderstandings. Although this article focuses on herbal dietary supplements, the principles of DNA testing are generalizable to any naturally derived product for which there is sufficient DNA.
Why Use DNA?
Before the move to embrace DNA testing, many other methods were available to evaluate natural products. Modern pharmacopeias are brimming with different tests (e.g., macroscopic, microscopic, and chemical analyses) that have been evaluated and applied to a wide range of natural products, especially botanical ingredients. DNA methods are different from those tests because DNA is the part of an organism that defines its heredity. While many different characteristics can be used to identify and distinguish species — chiefly morphology (the structure of an organism) but also chemistry, geography, and behavior (i.e., characteristics related to growth and reproduction) — those characteristics must be heritable, and hence determined by their DNA, for the species to be evolutionarily distinct. One common definition of a species is a group of individuals that shares a common ancestor and is linked by a shared heredity (Figure 1). The chain of heredity is organized into taxonomic levels (e.g., species, genera, and families), but at each level the shared heredity of the members of those groups defines them. DNA is the measurable part of organisms that most directly defines what is shared among those taxonomic groups. This has always been true, but DNA testing is being used more widely now for its increased ease of use, lower cost, and rapid application.
It is important to note that while DNA is most directly associated with discerning the evolutionary identities of species, DNA does not answer all questions related to the authenticity or safety of herbal dietary supplements. Dietary supplements are valuable because of the biologically active substances they contain, and DNA testing cannot measure these substances. A dietary supplement that contains the DNA of a medicinal plant but none of the medicinal compounds would have no value to consumers. DNA also cannot be used to screen for the presence of dangerous chemical additives (e.g., the pigment lead chromate that is sometimes used to color turmeric [Curcuma longa, Zingiberaceae]). As such, DNA testing alone will not ensure that products are effective or safe to consume. Another issue is that DNA cannot distinguish among different plant parts, since the exact same DNA is in every cell of an organism. While genes are expressed differently in different tissues, the ability to quantify those differences in processed materials does not appear to be practicable at this time. Given these limitations, DNA testing should be part of an overall toolkit that also includes chemical analysis and morphology to ensure the authenticity, purity, and safety of dietary supplements.
Putting It into Practice
With the streamlined processes for DNA sequencing and the proliferation of commercial kits that enable DNA recovery from products, it is not difficult to recover DNA sequence data. Instead, the challenge lies in the correct interpretation of that data. Good examples in the published literature include methods that can be readily reproduced in other laboratories, help define boundaries in which DNA tests may be less effective, and employ DNA and chemistry jointly.2,3
The now-infamous New York attorney general investigation,4 which asserted widespread fraud in the dietary supplement industry based on DNA analyses of various products but failed to report how the results were obtained, is an example of what not to do with DNA-based testing. Similarly, some reports at conferences have suggested widespread substitution of authentic materials with toxic ingredients (e.g., bindweed [Convolvulus arvensis, Convolvulaceae]),5 where the results were more consistent with accidental field or laboratory contamination than with actual commercial malfeasance. These erroneous reports threaten to undermine trust in the use of DNA as a part of quality assurance programs.
Given the central role that genetics plays in determining species, DNA testing is appropriate when employed correctly, but it must be conducted in a standardized fashion using transparent, repeatable methods. Only then will it be valuable to both producers and consumers.
Classes of DNA Tests
A wide diversity of DNA methods can infer or directly read DNA sequence data from material, and this diversity can make it difficult to choose the most suitable method. DNA tests typically belong to one of two broad categories: one will generate DNA amplification products (copies of a region of DNA) of a known diagnostic size, while the other will directly read the DNA sequence data. Some of these differences are summarized in Tables 1 and 2 on page 61.
Methods that infer DNA sequence data include quantitative polymerase chain reaction (qPCR) and barcode DNA-high resolution melting (Bar-HRM), species-specific PCR, random amplification of polymorphic DNA (RAPD), and amplified fragment length polymorphism (AFLP). These methods generally deliver faster results and incur lower costs than those that directly capture and read DNA from a sample. However, they are based on the assumption that the test works every time that the species is present in the sample (i.e., no false negatives), and that the test always distinguishes among closely related species correctly. Because such tests never capture any direct sequence data from these methods, the threshold for validation needs to be high to ensure that inference methods provide correct results when testing highly processed foods.
Methods that directly capture sequence data (e.g., DNA barcoding, metabarcoding, shotgun or amplicon metagenomics) make fewer assumptions about the identities of the products. If sequence data are captured that match a combination of A, T, G, and C unique to a species, then that material can be correctly identified. Still, one must conduct due diligence to ensure that the DNA data can in fact distinguish the target species. At a bare minimum, if the DNA data perfectly match one or several of the same reference sequences, one can at least conclude that the data are fully consistent with that species being present. The other critical advantage of reading DNA directly is that when combined with high-throughput sequencing (HTS),* which is now the standard, one can estimate the relative purity of samples and, by extension, the relative proportions — something that cannot be done using methods based on inference. While a number of factors may affect the absolute estimate of relative proportions of species in a mixture, these factors may be explored and quantified such that it is known how much input of a product that has been processed in a given way produces how much DNA in a finished product. The inference of the relative proportion of ingredients ultimately remains a work in progress, and substantial efforts at validation are needed.
Managing the Trade-offs
Several critical factors are weighed when choosing an appropriate DNA test, including cost, speed, accuracy, power, and scope. The first question that must be answered when choosing a DNA test method relates to the scope of the analysis. DNA testing can be used to either detect the presence of DNA that is unique to one species only (a binary approach) or recover and classify DNA data from all species that may be present in a sample (a metagenomic approach). With a metagenomic approach, DNA recovered from a product is compared to a reference database containing DNA signatures from many species, allowing one to jointly determine identity and purity. These two approaches have different strengths and weaknesses and are based on different assumptions. Some of these issues are highlighted in Table 1, which outlines a few of the most common techniques, where they fit in the binary/metagenomic scheme, and their relative speed, cost, and scope. The discussion below is not exhaustive of all DNA-based approaches and instead seeks to lay out two alternative visions with contrasting strengths and assumptions.
In general, binary tests use PCR to capture a very small region of the genome that is deemed unique and diagnostic to the species under consideration. That means due diligence has been conducted to determine what the most common botanical substitutions might be, how much genetic variation exists within the species, and how robust the exact test is under a range of processing conditions. Methods such as species-specific PCR, or its close relative qPCR, are predicated on a few changes in the A, T, G, and C sequence of the commercial species such that the PCR only works on that species and never works in its absence. No actual DNA sequence information is ever acquired; rather, the presence of a unique and diagnostic sequence is inferred through the success of the PCR, typically measured via fluorescence, but in some cases as variation in the number and size of DNA fragments, much like in thin-layer chromatography (TLC).
Such approaches may have difficulty when a declared ingredient is diluted within a sample, since they can deliver a binary “yes” when the intended ingredient is very rare. These methods may also produce false negatives if the DNA is degraded or if the DNA solution contains enzymatic inhibitors due to a failure of the PCR (failure of the enzymatic PCR implies the DNA of the plant is missing). Some of these problems can be avoided with appropriate controls. Development of assays that target ever-smaller pieces of DNA (e.g., “mini-barcoding”) may be able to be applied to increasingly processed products, but the trade-off is that less DNA is used in any single assay and, as such, may be less specific to that species. Also, binary tests provide no information as to the purity of a product. A separate assay needs to be developed for each species, and proving the method only works when the target species is present is not a trivial endeavor. Alternatively, such PCR-based assays can be very sensitive with very low detection thresholds and, as such, more useful for processed products.
Binary tests based on qPCR are currently used most often in food safety and medicine. With sufficient due diligence to determine if the assay is legitimately unique to the species to be identified, binary tests are a rapid and low-cost approach to confirm the presence of a botanical ingredient.
Metagenomic-based tests are a newer development, and there is a tremendous recent push to develop software that interprets DNA data. The principle of metagenomic approaches is founded on the idea that DNA sequencing is affordable and will continue to get less expensive. Metagenomic approaches generally take the path of recovering data from one or many parts of the genome that are isolated from an ingredient, and then classifying and summarizing results.
The advantages of this approach are that DNA sequence data are acquired and the actual sequence of A, T, G, and C is read for one or many parts of the genome. In this sense, the identity of the ingredient is not inferred and is instead determined directly. Increasingly, the reference databases that are used to distinguish among species use multiple genomic regions or entire genomes to distinguish among species (e.g., whole chloroplast genomes, or, for probiotics, the entire bacterial genome), vastly improving the accuracy of the identifications and ameliorating concerns about the small number of sequence characters used to underpin binary tests. Another advantage is that the same laboratory method can be applied for nearly all products, which helps simplify workflows, allows for increased automation, and eliminates concerns as to whether the correct test was used. Lastly, these approaches provide a strong framework for statistical analyses of the results, such that probabilities (P values and confidence intervals) can be assigned to the results, providing a direct statement as to the quality of the identification.
Metagenomic approaches are limited by the cost and time, which are both greater than for binary tests. Another important issue is that for most implementations, more DNA must be available to be isolated from the product, which may limit the use of metagenomic approaches for highly processed products. This means that metagenomic approaches will likely only work well at early stages of ingredient processing in which they can provide confirmation of identity and purity, while binary PCR-based species-specific assays will work better on more highly processed materials.
To Test the Tests
The question is not whether DNA can contribute to quality testing programs — it certainly can. Instead, the current challenge is that there are so many different types of DNA-based tests with different strengths and weaknesses, and, in contrast to animal DNA testing, it often is unclear what type of test fits best for plants. Even after deciding between a species-specific or metagenomic approach, there remain numerous options within each of those two general test methods.
In a sense, genetics is a victim of its own success. The proliferation of different tests reflects the success of DNA-based assays to answer a broad array of questions. Given this, what is needed are inter-laboratory trials that employ an agreed-upon set of methods to determine their precision, reproducibility, and cost. A recent program sponsored by the US National Institute of Standards and Technology (NIST) demonstrated how badly standardization is needed, with results from different organizations leading to a wide range of conclusions.12 Using the same test samples produced results that rival those of the New York attorney general investigation4 in which many rare contaminants were incorrectly identified as an important part of the sample. Validation trials, such as the AOAC Performance Tested Methods validation, are useful, but these do not directly compare the performance of different approaches. As such, even if an approach is successful in a validation trial, it does not mean that approach is actually better than other approaches, and the trial may well establish a very narrow range of conditions within which validation is achieved. A merging of the NIST and AOAC approaches, in which an agreed-upon set of DNA methods are jointly employed using the same test materials, would be a step toward standardization. Using such an approach would allow the analytical community to begin to form a more objective conclusion about the accuracy, repeatability, and cost of different methods.
For any of these diverse DNA methods to work, a set of accurate reference materials is needed that represents the products to be identified. For foods and botanical products, the sequences included in reference databases (e.g., GenBank) frequently are the best source for reference materials. If the database is poor (low species diversity, erroneous entries, absence of important information, etc.), the best technology in the world will not help. The most common type of reference sequences are DNA barcodes — these are the minimum data content that provide a relatively high level of diagnostic power to identify species (see Figure 2). While the officially recognized DNA barcodes13,14 can provide useful information on the identity of a plant, these barcodes do not work as well across a wide range of plants, contrary to animals or fungi. Officially recognized barcodes correctly identify plant species about 75% of the time, and that is likely an overestimate.15
As a result, many authors have begun sequencing and making entire chloroplast genomes for plants publicly available. The chloroplast genome is the remaining genome of a bacterium that was captured and converted from a free-living cyanobacterium to a symbiotic organelle within plant cells. The chloroplast genome is approximately 100 times larger than the combined two-genomic region plant DNA barcode, and accordingly delivers much higher rates of diagnostic resolution (approaching 95% for most groups of plants). However, it will provide only the genetic information from the maternal parent, which means that additional information from nuclear or mitochondrial regions may be warranted when investigating recent hybrid species. While it is true that the chloroplast genome only contains information inherited from the maternal side of the plant, it is large enough that diagnostic mutations can be found to distinguish nearly all species from one another.
The process of capturing these data is well established, and publication of these data in GenBank or other public data repositories represents the commitment to transparency that is needed to ensure a safe and authentic supply of food and botanical dietary ingredients. While nearly any DNA data can be published in GenBank, data that are maintained and curated within discrete BioProjects (e.g., GenBank BioProject PRJNA515225) and which are linked to vouchers are much more reliable, as are records maintained in the RefSeq public database of nucleotide and protein sequences. It is worth noting that the publication of these data will assist in the development of any type of DNA-based assay (species-specific or metagenomic), and most professional herbaria are willing to share access to vouchered plant material if the recovered data are made freely available.
The use of DNA testing in quality control programs can help ensure the authenticity, purity, and safety of food and herbal products. As with all new technologies, DNA analytical methods need to be evaluated and ultimately validated. Concerns that DNA methods are too complex and redundant with existing chemical methods echo concerns of the past, as when high-performance liquid chromatography (HPLC) was first introduced. DNA data are uniquely suited to confirming species identity, since species are defined by their genetic lineage. The way that DNA can simultaneously be used to infer authenticity and purity is novel to DNA-based tests and allows these methods to complement chemical methods that identify bioactive constituents. The decision to implement a narrower species-specific assay or a broad-based screen in a validation program will need to be considered carefully. Ultimately, use of DNA data will be employed in conjunction with chemical and morphological assays to demonstrate to consumers that producers have made a commitment to quality.
David L. Erickson, PhD, is an associate research scientist at the University of Maryland, working in the Joint Institute for Food Safety and Applied Nutrition. In addition to his academic post, Erickson co-founded a biotechnology company, DNA4 Technologies LLC, which specializes in software development and laboratory protocols applying DNA data to the validation of natural product identity and composition. Erickson received his PhD in botany from the University of Georgia, after which he worked at the Smithsonian Institution’s National Museum of Natural History, notably developing their international DNA barcoding program. He has published widely on topics combining genetics, ecology, and evolution, and is currently focused on developing tools that use genomics to ensure a safe, authentic, and publicly transparent food supply.
* Although “next-generation sequencing” (NGS) is still widely used as an overarching term for post-Sanger DNA sequencing methods, many prefer the term “high-throughput sequencing” (HTS). NGS is no longer the newest type of sequencing. For example, long-read sequencing or third-generation sequencing is now routinely used for DNA barcoding initiatives.
- Shaw PC, But PP. Authentication of Panax species and their adulterants by random-primed polymerase chain reaction. Planta Med. 1995;61(5):466-469. doi: 10.1055/s-2006-958138.
- Pawar RS, Handy SM, Cheng R, Shyong N, Grundel E. Assessment of the authenticity of herbal dietary supplements: Comparison of chemical and DNA barcoding methods. Planta Med. 2017;83:921-936. doi: 10.1055/s-0043-107881.
- Ivanova NV, Kuzmina ML, Braukmann TWA, Borisenko AV, Zakharov EV. Authentication of herbal supplements using next-generation sequencing. PloS One. 2016;11(12):e0168628. https://doi.org/10.1371/journal.pone.0156426.
- A.G. Schneiderman asks major retailers to halt sales of certain herbal supplements as DNA tests fail to detect plant materials listed on majority of products tested [press release]. Albany, NY: New York State Attorney General’s Office; February 3, 2015. Available at: http://www.ag.ny.gov/press-release/ag-schneiderman-asks-major-retailers-halt-sales-certain-herbal-supplements-dna-tests. Accessed April 1, 2019.
- Daniells S. Adulteration with bindweed: A big concern that nobody is talking about? NutraIngredients-USA. September 25, 2016. Available at: http://www.nutraingredients-usa.com/Article/2016/09/26/Adulteration-with-bindweed-A-big-concern-that-nobody-is-talking-about. Accessed April 1, 2019.
- Wallinger C, Juen A, Staudacher K, et al. Rapid plant identification using species- and group-specific primers targeting chloroplast DNA. PloS One. 2012. https://doi.org/10.1371/journal.pone.0029473
- Die JV, Roman B, Flores F, Rowland LJ. Design and sampling plan optimization for RT-qPCR experiments in plants: A case study in blueberry. Front Plant Sci. 2016. https://doi.org/10.3389/fpls.2016.00271.
- Smith NR, Trigiano RN, Windham MT, et al. AFLP markers identify Cornus florida cultivars and lines. J Am Soc Hortic Sci. 2007;132(1):90-96. https://doi.org/10.21283/JASHS.132.1.90.
- De las Rivas B, Marcobal A, Muñoz R. Development of a multilocus sequence typing method for analysis of Lactobacillus plantarum strains. Microbiol. 2006;152(1):85-93. doi: 10.1099/mic.0.28482-0.
- Prado M, Ortea I, Vial S, Rivas J, Calo-Mata P, Barros-Velázquez J. Advanced DNA- and protein-based methods for the detection and investigation of food allergens. Crit Rev Food Sci Nutr. 2016;56(14):2511-2542. https://doi.org/10.1080/10408398.2013.873767.
- Raime K, Krjutškov K, Remm M. Method for the identification of plant DNA in food using alignment-free analysis of sequencing reads: A case study on lupin. Front Plant Sci. 2020. https://doi.org/10.3389/fpls.2020.00646.
- Barber CA, MM Phillips, CA Rimmer, et al. Dietary Supplement Laboratory Quality Assurance Program: Exercise O Final Report. 2019. https://doi.org/10.6028/NIST.IR.8266.
- CBOL Plant Working Group. A DNA barcode for land plants. PNAS. 2009;106(31):12894-12897.
- Schoch C, Seifert K, Huhndorf S, et al. Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for fungi. PNAS. 2012;109:6241-6246. https://doi.org/10.1073/pnas.1117018109.
- Hollingsworth PM, Graham SW, Little DP. Choosing and using a plant DNA barcode. PloS One. 2011;6(5):e19254.