Relic DNA contributes minimally to estimates of microbial diversity

Extracellular or "relic" DNA is one of the largest pools of nucleic acids in the biosphere. Relic DNA can influence a number of important ecological and evolutionary processes, but it may also bias estimates of microbial abundance and diversity, which has implications for understanding environmental, engineered, and host-associated ecosystems. We developed theoretical predictions from a simulation model based on sampling a species abundance distribution to assess how and when relic DNA biases estimates of microbial diversity. We then tested our predictions by quantifying relic DNA and assessing its contribution to bacterial diversity using 16S rRNA gene sequences collected from soil, sediment, water, and mammalian gut environments. On average, relic DNA made up 33 % of the total bacterial DNA pool, but exceeded 80 % in some samples. Despite its abundance, relic DNA had a minimal effect on estimates of taxonomic and phylogenetic diversity, which is consistent with model simulations where the processes regulating the compositional turnover of relic and intact DNA are at steady state. When such conditions are met, relic DNA should not bias cultivation-independent estimates of microbial diversity.


48
processes are reflected by the size distribution of relic DNA, which tends to be skewed towards small fragments ranging between 100 and 500 base pairs 10,11 . In addition, the residence time of 50 relic DNA is extremely short in some ecosystems. In freshwater and marine habitats, extracellular DNA turns over in less than one day 12,13 while plasmid DNA begins to degrade in 52 just minutes 14 . In other environments, the degradation of relic DNA can be much slower. For example, DNA can bind with inorganic and organic substances in soils and sediments depending 54 on cation availability and pH 9 . The resulting adsorption can increase the persistence of relic DNA by protecting it from hydrolytic enzymes 15,16 . In addition, relic DNA is contained in 56 biofilms 8 , and appears to be enriched in the gut mucosa of hosts with high-fat diets 17 . As a result, even in systems with high DNase activity, large pools of relic DNA can accumulate 18 . 58 To date, the documented effects of relic DNA on estimates of diversity are idiosyncratic.
Even in samples with relatively large relic DNA pools, bias can be non-existent or substantial, 60 and lead to the overestimation or underestimation of diversity 19,20 . To delve further into this apparent unpredictability, we developed a theoretical framework to explore how relic DNA 62 affects estimates of microbial diversity. While some variability in the effect of relic DNA may reflect taxonomic-or ecosystems-specific features, we hypothesized that the magnitude and 64 direction of bias can be explained by the richness and evenness of sequences contained in the relic DNA pool. We used a simulation model to identify conditions where sampling effects lead 66 to biased estimates of microbial diversity based on the size and species abundance distribution (SAD) of the relic DNA pool. We then tested our model predictions by quantifying the 68 contribution of relic DNA on the abundance and diversity of bacterial communities in four distinct ecosystem types, including soils, sediments, surface waters, and mammalian guts. 70

RESULTS 72
We developed a model that represents the dynamics of relic DNA. The size of the relic DNA pool is affected by inputs associated with the death rates of microorganisms and by losses 74 associated with the degradation of relic DNA. When death rates exceed degradation rates, the size of the relic DNA pool increases (Fig. 1a). However, a large relic DNA pool is not sufficient 76 to create bias in diversity estimation. Rather, the direction and magnitude of bias reflects the simultaneous sampling of taxa from the species abundance distributions (SAD) associated with 78 the relic and intact pools. Using sampling-based simulations (see Methods), we found that there was no bias in estimating the number of microbial taxa (richness) when the SAD of the intact 80 community was similar to the SAD of the relic pool ( Fig. 1b-c). Such conditions are satisfied when the death rates and degradation rates are proportional among taxa. Owing to sampling 82 effects, our simulations reveal that bias can arise when the SADs of the relic and intact pools are different. For example, our simulations predict that richness is overestimated when diversity 84 (richness or evenness) is elevated in the relic pool, which reflects the increased probability of sampling many unique taxa (Fig. 1b-c). Alternatively, richness can be underestimated when the 86 diversity (richness or evenness) of the relic DNA pool is lower than the diversity of sequences in the intact pool ( Fig. 1b-c), which reflects an increased probability of sampling a few 88 disproportionately abundant (i.e., dominant) taxa. Such conditions should arise when processes (death and degradation) influencing the turnover of relic DNA are decoupled for different taxa in 90 the intact and relic pools (Fig. 1a). The conditions leading to the over-or under-estimation of richness also translate into biased measurements of beta-diversity (between-samples) due to 92 compositional changes in the relative abundances of sequences (Fig. S1). Thus, we predict that properties of the species abundance distribution -in particular, richness and evenness -94 determine whether or not relic DNA biases estimates of microbial diversity.

96
Variation in relic DNA pool size -We evaluated our model predictions by collecting samples from different ecosystem types (soil, sediment, water, and mammalian guts) that are 98 hypothesized to vary in relic DNA turnover rates 10,16 . Using quantitative PCR with 16S rRNA primers, we quantified intact DNA after treating an aliquot of sample with DNase to remove relic 100 DNA. In addition, we quantified total DNA (intact + relic) from a control aliquot of sample that was not treated with DNase (see Methods and Fig. S2). Relic DNA accounted for a substantial 102 but variable portion of the total bacterial DNA. The proportion of relic DNA was normally distributed (mean ± sd = 0.33 ± 0.218) and ranged from 0 to 0.83 across 34 samples obtained 104 from different ecosystem types (Fig. 2). These results are in line with the amount of relic DNA that has been reported in other studies 19,20,24 . Even though host and environmental features 106 associated with these habitats are thought to influence relic DNA dynamics, there was only a marginally significant effect of ecosystem type on the proportion of relic DNA (one-way 108 ANOVA, F 3, 30 = 2.43, P = 0.08) 2,19 . Nevertheless, the large average pool size and range in relic DNA provided us with the opportunity to test our model predictions regarding the magnitude and 110 direction of bias that relic DNA should have on estimates of microbial diversity.

112
Magnitude and direction of bias on microbial diversity -We sequenced 16S rRNA genes from the intact and total DNA pools to determine whether and to what degree relic DNA biased 114 the taxonomic and phylogenetic alpha-diversity of bacterial communities. Despite accounting for a substantial portion of the total DNA (Fig. 2), relic DNA had no effect on estimates of richness, 116 evenness, or phylogenetic diversity (PD) ( Table S1). We expressed each of these alpha-diversity metrics as a ratio (total / intact), where values > 1 represent overestimation bias and values < 1 118 represent underestimation bias. Relic DNA had no effect on the diversity ratios based on the observation that the 95 % confidence intervals overlapped with 1.0 (Fig. 3). Furthermore, the 95 120 % confidence intervals of the diversity ratios overlapped across ecosystem types, indicating that the contribution of relic DNA to all measures of diversity was low irrespective of the microbial 122 habitat sampled. Finally, using simple linear regression, we tested our model predictions that the magnitude of bias has the potential to increase with increasing relic DNA pool size (see Fig. 1b-124 c). For all diversity ratios (richness, evenness, and PD), the slopes were not different from zero DNA on the compositional diversity for the intact and total DNA pools. Principal Coordinates Analysis (PCoA) revealed that bacterial composition was strongly affected by ecosystem type 132 ( Fig. S4), which was supported by permutational multivariate analysis of variance (PERMANOVA) using both taxonomic (R 2 = 0.35, F 3,50 = 9.33, P = 0.001) and phylogenetic (R 2 134 = 0.65, F 3,50 = 31.8, P = 0.001) distance matrices. However, multiple lines of evidence suggest that relic DNA had no appreciable effect on the observed bacterial composition. First, the intact 136 and total DNA pools were significantly and highly correlated with one another when we performed a Mantel test using Bray-Curtis distances, which reflect taxonomic dissimilarity (P = 138 0.001, r = 0.959), and UniFrac distances, which capture phylogenetic dissimilarity (P = 0.001, r = 0.996). Second, we tested for the effect of relic DNA on beta-diversity by calculating centroid 140 distance ratios (see Methods). If the centroid distance ratio was > 1, we concluded that relic DNA inflated beta-diversity; if the distance ratio was < 1, we concluded that relic DNA 142 homogenized beta-diversity. Relic DNA had no effect on the centroid distance ratios based on the observation that the 95 % confidence intervals overlapped with 1.0 (Fig. 4). Furthermore, the 144 95 % confidence intervals for the centroid distance ratios overlapped across ecosystem types, indicating that the contribution of relic DNA to beta-diversity was unaffected by different taxa or 146 environmental conditions in the distinct microbial habitats. Last, we used simple linear regression to test our model predictions that bias in estimates of community composition would 148 increase with increasing relic DNA pool size (see Fig. S1). The proportion of relic DNA in a sample had no effect on the slopes for centroid distance ratios regardless of whether they were 150 calculated using taxonomic (P = 0.48) or phylogenetic (P = 0.75) distance matrices. Similarly, the intercepts for these regression relationships were not significantly different from 1 (P > 0.3) 152 (Fig. S5). These empirical findings are consistent with simulations predicting that in order for bias to arise, there must be differences in the diversity between the intact and relic DNA pools 154 ( Fig. S1).

DISCUSSION
Rarely, if ever, are biological communities completely censused. As a result, estimates of 158 diversity are often based on incomplete sampling, which introduces uncertainty and potential bias. For microbiologists using cultivation-independent approaches, diversity estimation is 160 further complicated by the accumulation and persistence of relic DNA, which may have an abundance distribution that is distinct from that of the sequences contained in viable cells. In 162 such cases, our simulation models predict that microbial diversity can be overestimated or underestimated depending on the size, richness, and evenness of the relic DNA pool (Fig. 1b-

c). 164
We tested our model predictions by analyzing the intact and total DNA in a range of ecosystems with relic DNA pool sizes that could bias estimates of bacterial diversity. We found that relic 166 DNA had minimal to no effect on commonly estimated measures of taxonomic and phylogenetic diversity, suggesting that the compositional turnover of relic and intact DNA may approach 168 steady state conditions in a range of ecosystems despite different environmental conditions and divergent microbial assemblages (Fig. S4). In the following sections, we discuss scenarios where 170 relic DNA should and should not lead to bias in the estimation of microbial diversity. Our understanding of the microbial biosphere has been transformed by the development and application of molecular-based cultivation-independent techniques. The ability to rapidly 210 obtain millions of gene sequences from a range of environments has yielded valuable insight into the processes that regulate community assembly and function 25,26 , and has also paved the way 212 for the discovery of new metabolisms 27 , tests for unifying patterns of biodiversity 28 , and an updated tree of life 29 . There are limitations, however, associated with culture-independent 214 techniques, which include inefficient nucleic-acid extraction methods 29 and "universal" primers that over-represent some taxonomic groups while overlooking others 30 . Sequencing of relic 216 DNA is another important concern, which can potentially lead to the overestimation or underestimation of microbial diversity. However, this bias requires the decoupling of processes 218 that regulate the compositional turnover of the relic and intact DNA pools. While some recent evidence suggests this can arise 20 , our results from diverse ecosystems suggest that relic DNA 220 contributes minimally to the characterization of microbial community structure.

METHODS 224
Relic DNA Model -We used sampling-based simulations to model the mixing of intact and relic DNA pools. For each simulation, we sampled a regional species pool consisting of 10,000 226 taxa with a lognormal abundance distribution 21 . The intact community consisted of 1,000,000 individuals sampled from this regional pool. We then combined this intact community with a 228 relic community at proportions from 0.01 to 0.96. In the first set of simulations, we altered the evenness of the regional pool from which the relic community was sampled. We altered the 230 evenness by changing the scale parameter of the lognormal distribution. To decrease the evenness of the relic DNA pool, we increased the scale parameter from 0.98 to 2. To increase the 232 evenness of the relic DNA pool, we decreased the parameter from 0.98 to 0.1. In the second set of simulations, we altered the richness of the regional pool from which the relic community was 234 sampled. We altered richness by increasing the magnitude of the regional pool while maintaining the order of species and the scale parameter of the lognormal distribution. To increase the 236 richness of the relic DNA pool, we increased the size of the regional pool from 10,000 to 50,000 taxa. To decrease the richness of the relic DNA pool, we decreased the size of the regional pool 238 from 10,000 to 5,000 taxa. After mixing the intact and relic communities for both simulations, we rarified the total community to 10,000 observations and calculated richness and Bray-Curtis 240 distances to estimate compositional differences between the intact and total DNA pools. To host-associated ecosystems. First, we sampled sediments and surface water from lakes near the Michigan State University, W.K. Kellogg Biological Station (KBS) in Hickory Corners, 248 Michigan, USA. Soils were sampled from the main sites and surrounding areas at the KBS Long-Term Ecological Research site 33 . We also collected fresh feces as representative gut microbiome 250 samples from cows, dogs, horses, rabbits, and humans. In each of these ecosystem types, we obtained samples from 6 -8 independent sites. After transporting samples to the laboratory, we 252 added DNase to an aliquot of a sample to remove relic DNA. Our method was based on procedures that have previously been used to quantify relic DNA in marine sediments 24 , host 254 tissue 19 , and drinking-water biofilms 23 . The DNase-based method has proven to be effective not only at removing extracellular DNA, but also DNA that is contained inside of dead cells 23 . 256 Furthermore, it has been shown that DNase does not affect the integrity of living cells 22,23 . Thus, we assumed that the remaining "intact DNA" from the enzymatically treated aliquot was 258 contained in potentially viable cells. We used another aliquot without DNase from the same sample as a control, which we refer to as "total DNA". For aqueous samples, we filtered 250 mL 260 samples onto 47 mm 0.2 µm Supor filter (Pall Corporation) at 10 mm Hg of vacuum pressure, cut the filter in half and randomly assigned one half to a DNase treatment and used the other half 262 as the control. For non-aquatic samples (gut, sediment, and soils), 0.25 g aliquots were transferred into two 2 mL microcentrifuge tubes. We then added 440 mL of a DNase buffer 264 Based on melting curve analyses, we found no evidence for primer dimers or non-specific amplification. 300

Contribution of relic DNA to bacterial diversity 302
Community sequencing -We estimated the contribution of relic DNA to bacterial diversity using high-throughput sequencing of the 16S rRNA gene. Specifically, we amplified the V4 304 region of the 16S rRNA gene from the intact and total DNA pools of each sample using barcoded primers (515F and 806R) designed to work with the Illumina MiSeq platform 37 . We 306 cleaned the sequence libraries using the AMPure XP purification kit, quantified the resulting products using the QuantIt PicoGreen kit (Invitrogen), and pooled libraries at equal molar ratios 308 (final concentration: 20 ng per library). We then sequenced the pooled libraries with the Illumina MiSeq platform using paired end reads (Illumina Reagent Kit v2, 500 reaction kit) at the Indiana 310 University Center for Genomics and Bioinformatics Sequencing Facility. Paired-end raw 16S rRNA sequence reads were assembled into contigs using the Needleman algorithm 38 . We 312 obtained a total of 12,916,632 16S rRNA sequences from 42 samples representing 21 sites. After quality trimming with a moving average quality score (window 50 bp, minimum quality score 314 35), we aligned the sequences to the Silva Database (version 123) using the Needleman algorithm. Chimeric sequences were removed using the UCHIME algorithm 39 . After this 316 filtering, there was an average (± SEM) of 222,701 ± 9,560 sequences per site. We created operational taxonomic units (OTUs) by first splitting the sequences based on taxonomic class 318 (using the RDP taxonomy) and then binning sequences into OTUs based on 97% sequence similarity. Our depth of sequencing led to a high degree of coverage across samples (minimum 320 Good's Coverage = 0.98). For phylogenetic analysis, we picked representative sequences for each OTU by using the most abundant unique sequence. We used FastTree 40 to generate a 322 phylogenetic tree from the representative sequences using the generalized time-reversible model of nucleotide evolution. We calculated phylogenetic distances using weighted UniFrac distances 324 41 . All initial sequence processing was completed using the software package mothur (version 1.38.1) 42 . 326 Alpha Diversity -We estimated the effects of relic DNA on richness, evenness, and 328 phylogenetic diversity for the intact and total DNA pools with a sample. To estimate the number of OTUs (i.e., richness), we used a resampling approach that subsampled each sample to an 330 equal number of sequences per sample and summed the number of OTUs that were represented 43 . Briefly, we subsampled to 30,000 observations, resampled 999 additional times, and then 332 calculated the average richness estimates (± SEM) for each sample. To estimate the equitability in abundance among taxa in a sample (evenness), we used the same resampling approach and 334 calculated average evenness estimates (± SEM) using Simpson's Evenness index 44 . To test whether relic DNA affected the phylogenetic diversity within a sample, we subsampled 336 communities to 30,000 observations and then calculated Faith's D statistic, which sums the branch lengths for each species found in a sample from the root to the tip of the phylogenetic tree 338 45 . All estimations were performed in the R statistic environment (v 3.3.2) 46 using the vegan, ape, ade4, picante, and plyr packages, along with custom functions. 340 Beta Diversity -We estimated the effects of relic DNA on between-sample (i.e., beta) diversity 342 by comparing the taxonomic and phylogenetic diversity of bacterial communities in the intact and total DNA pools. First, we conducted a Principal Coordinates Analysis (PCoA) on log 10 -344 transformed relative abundance data to visualize the effects of relic DNA removal (via DNase treatment) on bacterial community composition within and among ecosystem types. The PCoA 346 was performed with Bray-Curtis and UniFrac distances to assess taxonomic and phylogenetic effects, respectively. In addition, we used PERMANOVA to test for differences in taxonomic 348 and phylogenetic composition based on ecosystem type for the total DNA pool. Second, we conducted a Mantel test to assess the correlation between the community resemblance matrices 350 (either Bray-Curtis or UniFrac) represented by the intact and total DNA pools. Last, we tested whether relic DNA altered beta-diversity within an ecosystem type by comparing centroid 352 distances. To calculate this metric of sample dispersion, we determined the centroid from a PCoA with either Bray-Curtis or UniFrac distances for the total DNA pool for all sites within a 354 given ecosystem type. We then measured the Euclidean distances between the centroid and all samples (total and intact) to determine the centroid distances. 356

ACKNOWLEDGMENTS 358
We acknowledge constructive feedback from KD Webster, RZ Moger-Reischer, EK Hall, NI Wisnoski, and WR Shoemaker. We thank S. Kuenzel and B.K. Lehmkuhl for technical support.  sampling-based simulation model that created communities populated with individuals with intact DNA sampled from a lognormal species abundance distribution. We then mixed the intact 488 community with relic individuals so that the relic contribution to total community ranged from 0.01 to 0.96. The relic community was also populated with individuals sampled from a 490 lognormal species distribution, but which varied in (b) evenness (E) or (c) richness (S). To quantify bias in diversity, we calculated "richness ratios" which reflect the number of species in 492 the total DNA pool (intact + relic) divided by the number of species in the intact DNA pool.
When values richness ratios = 1, relic DNA has no effect on diversity; when richness ratios > 1, 494 relic DNA leads to overestimates of diversity; richness ratios <1, relic DNA lead to underestimation of diversity. 496 concentration was quantified without DNase treatment. Relic DNA constituted an appreciable portion of the total DNA pool, but was not affected by the ecosystem type from which the 502 sample was collected (gut, soil, sediment, and water). Data are represented as means ± 95% confidence intervals. 504  distance metrics (Bray-Curtis and UniFrac, respectively). The centroid distance ratio was calculated on each sample within an ecosystem type and reflects the composition of the total DNA pool (intact + relic) relative to the intact DNA pool. Relic DNA had no effect on betadiversity for any of the ecosystem types sampled. Data are represented as means ± 95% confidence intervals. The amount of relic DNA in a microbial environment is determined by inputs associated with the mortality of viable individuals with intact DNA and by losses associated with the degradation of relic DNA. If the diversity of sequences contained in the relic DNA pool is sufficiently different from that in the intact DNA pool, then relic DNA may bias estimates of microbial biodiversity (as indicated by different colored boxes) when sampling from the total (intact + relic) DNA pool. We developed a sampling-based simulation model that created communities populated with individuals with intact DNA sampled from a lognormal species abundance distribution. We then mixed the intact community with relic individuals so that the relic contribution to total community ranged from 0.01 to 0.96. The relic community was also populated with individuals sampled from a lognormal species distribution, but which varied in (b) evenness (E) or (c) richness (S). To quantify bias in diversity, we calculated "richness ratios" which reflect the number of species in the total DNA pool (intact + relic) divided by the number of species in the intact DNA pool. When values richness ratios = 1, relic DNA has no effect on diversity; when richness ratios > 1, relic DNA leads to overestimates of diversity; richness ratios <1, relic DNA lead to underestimation of diversity.

Fig. 2. 518
Figure 2 | Proportion of bacterial relic DNA in different ecosystem types. We quantified the amount of intact DNA in a sample after removing relic DNA with a DNase treatment. We then estimated the proportion of relic DNA as 1 -(intact DNA / total DNA), where the total DNA concentration was quantified without DNase treatment. Relic DNA constituted an appreciable portion of the total DNA pool, but was not affected by the ecosystem type from which the sample was collected (gut, soil, sediment, and water). Data are represented as means ± 95% confidence intervals. We tested for the effects of bias caused by relic DNA by calculating diversity ratios for (a) richness, (b) evenness, and (c) phylogenetic diversity. The ratios reflect the diversity of the total DNA pool (intact + relic) divided by the diversity of the intact DNA pool. Relic DNA did not bias any measures of diversity in any of the ecosystem types. Data are represented as means ± 95% confidence intervals. Richness was calculated as the number of operational taxonomic units (97% sequence similarity of the 16S rRNA gene), evenness was calculated using Simpson's evenness index, and phylogenetic diversity was calculated using Faith's D index. We tested for the effects of bias caused by relic DNA by calculating a beta-diversity ratio based on centroid distances. Centroid distances were estimated after performing Principle Coordinates Analyses (PCoA) using taxonomic (a) and phylogenetic (b) distance metrics (Bray-Curtis and UniFrac, respectively). The centroid distance ratio was calculated on each sample within an ecosystem type and reflects the composition of the total DNA pool (intact + relic) relative to the intact DNA pool. Relic DNA had no effect on beta-diversity for any of the ecosystem types sampled. Data are represented as means ± 95% confidence intervals.