Associations among Wine Grape Microbiome, Metabolome, and Fermentation Behavior Suggest Microbial Contribution to Regional Wine Characteristics

ABSTRACT Regionally distinct wine characteristics (terroir) are an important aspect of wine production and consumer appreciation. Microbial activity is an integral part of wine production, and grape and wine microbiota present regionally defined patterns associated with vineyard and climatic conditions, but the degree to which these microbial patterns associate with the chemical composition of wine is unclear. Through a longitudinal survey of over 200 commercial wine fermentations, we demonstrate that both grape microbiota and wine metabolite profiles distinguish viticultural area designations and individual vineyards within Napa and Sonoma Counties, California. Associations among wine microbiota and fermentation characteristics suggest new links between microbiota, fermentation performance, and wine properties. The bacterial and fungal consortia of wine fermentations, composed from vineyard and winery sources, correlate with the chemical composition of the finished wines and predict metabolite abundances in finished wines using machine learning models. The use of postharvest microbiota as an early predictor of wine chemical composition is unprecedented and potentially poses a new paradigm for quality control of agricultural products. These findings add further evidence that microbial activity is associated with wine terroir.

biodiversity patterns on grapes elsewhere globally (11)(12)(13). Regional strains of Saccharomyces cerevisiae, the principal yeast species involved in wine fermentations, produce distinct wine chemical compositions, demonstrating one prominent route by which regional microbes influence terroir (14). Beyond Saccharomyces yeasts, wine fermentation is a complex, multispecies process, and the synergistic effects of these consortia on wine chemistry are yet unclear. An overwhelming body of evidence has defined the influences of numerous bacteria and fungi on the chemical and sensory properties of wines in pure culture (reviewed in reference 15), and nonfermentative grape-associated microbiota produce many sensory-active compounds associated with wine aroma, highlighting their potential in early flavor formation (16). However, the relationship between regional microbial patterns and wine metabolite profiles is unknown. Evidence of their interaction would implicate microbial activity in shaping the regional wine qualities that are important for defining product identity.
Furthermore, high-throughput sequencing techniques have expanded our knowledge of microbial diversity on grapes and in wine fermentations, but the possible roles and dynamics of these microbes during wine fermentation are understudied (10,13,(17)(18)(19). In addition to directly influencing wine chemical composition, understudied microbes could indirectly alter wine quality-e.g., by inhibiting fermentation progress or malolactic fermentations.
To address these issues, we conducted an exploratory study to assess (i) whether the grape microbiota and wine metabolomes exhibit distinct patterns of distribution at small geographic scales (e.g., neighboring vineyards), (ii) whether regional wine microbiomes and metabolomes are correlated, and (iii) associations between the microbiome, fermentation performance, and prefermentation grape must/juice characteristics. We employed highthroughput marker gene sequencing to longitudinally profile the bacterial and fungal consortia of over 200 commercial fermentations and musts (crushed grapes) of grapes grown throughout Napa and Sonoma Counties, CA ( Fig. 1; see Table S1 in the supplemental material). We used ultra-high-pressure liquid chromatography (UHPLC)/quadrupole time of flight mass spectrometry (QTOF MS) for nontargeted metabolite profiling of a subset of these must and wine samples, identifying marker metabolites that differentiate AVAs. We demonstrate that the grape/wine microbiota and metabolites are regionally distinct, the must and wine microbiota correlate with the wine metabolome and fermentation performance, and grape must microbial composition predicts the metabolite composition of the finished wine, suggesting that microbial dispersion patterns may contribute to regional wine characteristics.

RESULTS AND DISCUSSION
All samples were collected from Far Niente and Nickel & Nickel wineries, located approximately 2 km apart in Oakville, CA (Napa County). Cabernet Sauvignon (dry red wine) and Chardonnay (dry white wine) grape musts and fermentations were longitudinally sampled across fermentation and aging (Table 1). Red and white wine fermentations were sampled at different time points, as they are processed differently: white grapes are crushed and pressed immediately, and the clarified juices are fermented, whereas red grapes are crushed and fermented as must, which is only pressed after fermentation is complete (Table 1). Additionally, only the red wines underwent malolactic fermentation (MLF), a secondary bacterial fermentation during which Oenococcus oeni and other lactic acid bacteria deacidify wine by conversion of malic to lactic acid, accompanied by various sensory changes.

Microbial biodiversity distinguishes vineyards and viticultural areas (AVAs).
We have previously demonstrated that different grape-growing regions of California possess distinct, identifiable microbial patterns across large distances, correlated with local weather conditions (10). Thus, we first sought to test whether microbial patterns can be distinguished between contiguous AVAs and individual vineyards within a single growing region, Napa County, CA, and nearby sites in Sonoma County at different stages of fermentation ( Fig. 1; see Table S1 in the supplemental material).
Individual AVAs and vineyards were distinguished based on the microbial consortia present in the grape must/juice ( Fig. 2; see Table S2 in the supplemental material). Permutational multivariate analysis of variance (MANOVA) tests (see Table S2) confirmed that microbial composition is significantly different between at least two AVAs (Chardonnay bacteria, P Ͻ 0.001, R 2 ϭ 0.262, and fungi, P Ͻ 0.001, R 2 ϭ 0.233; Cabernet bacteria, P Ͻ 0.001, R 2 ϭ 0.154, and fungi, P ϭ 0.002, R 2 ϭ 0.105) and vineyards (Chardonnay bacteria, P Ͻ 0.001, R 2 ϭ 0.599, and fungi, P Ͻ 0.001, R 2 ϭ 0.408; Cabernet bacteria, P Ͻ 0.001, R 2 ϭ 0.353, and fungi, P Ͻ 0.001, R 2 ϭ 0.320). Random forest machine learning models confirm that all vineyards are distinguishable at classification accuracies between 79% (Chardonnay juice) and 82% (Chardonnay wine), 3.7-to 4.4-fold more accurate than random error rates (see Table S3 in the supplemental material). This separation was also dependent upon the grape variety: Chardonnay demonstrated stronger AVA differentiation for both bacterial and fungal profiles than Cabernet Sauvignon ( Fig. 2; see Table S2). Thus, local conditions appear to modulate microbial communities in addition to regional effects. Numerous microclimatic, viticul-tural, and geophysical factors could explain variation among vineyard sites beyond the scope of our measurements and are important questions for future studies. Intravineyard monitoring could elucidate which of these factors hold the greatest influence over localized microbial patterns, potentially yielding insight into manipulable elements for controlling local microbial communities: e.g., to reduce disease pressure or increase plant-beneficial populations.
Both AVA and vineyard-specific microbial signatures diminished during fermentation (Fig. 2) as growth of fermentative organisms reshaped the community structure, richness, and diversity of the wines ( Fig. 3 and Fig. 4; see Fig. S1 in the supplemental material). This effect was largely dependent on grape variety and winery: Chardonnay vineyards and AVAs retained significantly different bacterial profiles at end of fermentation (P Ͻ 0.001) ( Fig. 2; see Table S2 and Fig. S1 in the supplemental material) and Cabernet fungi differentiated vineyard origin of at least one vineyard (P Ͻ 0.001) ( Fig. 2 and 4; see Table S2), but Cabernet bacterial profiles became less distinct due to growth of Leuconostocaceae (O. oeni) during MLF conducted in these wines but not in the Chardonnays. Nevertheless, random forest classification models could still distinguish vineyards at accuracies of 81% (Cabernet) and 82% (Chardonnay) based on microbial profiles in the finished wine, indicating that vineyard-specific signatures are still retained through fermentation (see Table S3 in the supplemental material).
Wine metabolite profiles segregate growing regions. We next sought to test whether AVAs and vineyards produced differentiable wine metabolite patterns, and whether regional microbial patterns could translate to metabolomic differences in wines. Using ultra-high-pressure liquid chromatography (UHPLC)/quadru- weighted UniFrac distance (left two columns) and fungal Bray-Curtis dissimilarity (right two columns) in musts and wines (see column labels), categorized by vineyard (color) and AVA source (shape). Each point represents an individual sample, and sample proximity on the plot is a function of similarity in bacterial and fungal community composition. pole time of flight mass spectrometry (QTOF MS), we analyzed the metabolite profiles of 13 Chardonnay and 27 Cabernet Sauvignon wines in triplicate, representing distinct AVAs and vineyards tested with biological replication (minimum duplicate). These were finished and barreled but unblended fermentations (MLF stage for Cabernets, and "end" stage for Chardonnays [ Table 1]), enabling metabolite profiles to be compared directly to the micro-bial communities inhabiting the musts from which these wines were made. All vineyards and AVAs were represented by biological replicates: i.e., at least two separate vineyard blocks were analyzed per vineyard, and at least two separate vineyards were analyzed per AVA whenever possible.
Raw QTOF profiles revealed 1,585 mass features in Cabernet Sauvignon wines and 1,054 in Chardonnay wines. Profiles were filtered to remove putative metabolites that were not observed consistently across technical replicates or detected in low abundance. Only low-molecular-mass putative metabolites (Ͻ300 m/z) were analyzed to focus on compounds that are most likely aroma-active volatile compounds. (Larger compounds were primarily identified as grape-derived phenolic compounds and ignored in this study.) Of the remaining putative metabolites, we retained only those observed at significantly different abundances between regions (one-way analysis of variance [ANOVA] false discovery rate [FDR]-corrected P value of Ͻ0.05). In all, Cabernet Sauvignon wines contained 16 regionally differential low-mass features (see Table S4 in the supplemental material). Chardonnay wines contained 27 (see Table S5 in the supplemental material). In several cases, exact identities could be confidently determined by mass and tandem MS (MS/MS) spectrum matches to the metabolite databases or accurate run time and mass matches to authentic standards. In most cases, only approximate identities or no identity could be obtained. This is a common issue, as the QTOF analysis as used here is a nontargeted method and the reference databases are not tailored to wine metabolites. Many of these compounds represent acids, esters, and aldehydes, some of which are likely microbial. Others, such as tartaric acid, are strictly grape derived. Many of the grape-derived compounds, such as the phenolic compounds coumaric acid, gallic acid, catechin, epicatechin, and caffeic acid, are modified by microbial metabolism during wine fermentation (20)(21)(22).
Within grape varieties, wine metabolite profiles clearly splay out with principal-component analysis (PCA), associated with numerous significantly discriminant metabolites (FDR-  corrected P value of Ͻ0.05) (see Fig. S2 in the supplemental material). Chardonnay wines demonstrated greater discrimination between both growing regions and vineyards (see Fig. S2A), whereas separation was weaker among Cabernet Sauvignon wines, for which many vineyards were indistinguishable from each other (see Fig. S2B). The reasons why Chardonnay microbiota better differentiate region are unclear, but the use of MLF in all Cabernet wines studied (but not the Chardonnays) is one possibility. The Chardonnay vineyards sampled in this study also came from more distant and diverse regions across Napa and Sonoma Counties (e.g., Carneros, Coombsville, Oakville, Russian River), compared to Cabernet (St. Helena, Oakville, Rutherford, and Yountville are contiguous regions on the valley floor), and thus differences in climate, topography, and regional distance are all possible causes that cannot be unraveled in the present study.
Microbial patterns correlate to regional metabolite profiles. To further dissect the relationship between regionally differential must microbiota and wine metabolites, multifactorial analysis (MFA) (23) was used to investigate underlying relationships between putative metabolite profiles, grape microbiota, and region of origin (Fig. 5). MFA is a generalization of principal-component analysis (PCA), in which sample similarity is decided by multiple different sets of observations (in this case, both metabolites and taxonomic features) to determine a consensus ordination. This analysis calculates sample similarity (as ordination plots akin to PCA), the degree of similarity between each set of observations, and correlations between individual observations. MFA could confidently separate wines by growing region and vineyard based on putative metabolite profiles and grape microbiota ( Fig. 5A and E). Highly similar ordination patterns of regional and site-specific segregation were observed based on metabolite, bacterial, and fungal profiles ( Fig. 5B and F), and numerous correlations were detected between variables in all three groups ( Fig. 5C and G), demonstrating close correspondence between microbial and metabolic profiles. In Chardonnay wines, fungal profiles were more closely associated with putative metabolite profiles than with region alone (Fig. 5C), and several interesting correlations emerged between the microbiome and metabolome (Fig. 5D): notably, between Leuconostocaceae (O. oeni is the top BLAST hit for all Leuconostocaceae sequences detected) and entity 136.0498@0.9307 (i.e., accurate mass of 136.0498 at LC run time of 0.9307 min) (possible hits, methylbenzoate, phenyl acetate, or p-anisealdehyde); Hanseniaspora uvarum and entity 120.0577@3.3848 (possible hits, acetophenone, phenylacetaldehyde, or 3-methyl benzaldehyde), and Pichia guilliermondii and entities 144.1169@ 7.3606 (octanoic acid) and 114.0702@2.2747 (C 6 H 10 O 2 acid, ester, or lactone) (Fig. 5D). These microbes are all known fermentative organisms (some with poorly characterized phenotypes); these putative metabolites are all important sensory-active wine components or potentially sensory-active metabolites, and all are correlated by MFA with vineyards in Carneros, one of the most renowned, cold-weather AVAs for Chardonnay production within Napa County.  Table S5 in the supplemental material. To improve readability of the plots, only the top correlations in each dimension are shown.
Cabernet Sauvignon bacterial profiles were more closely associated with putative metabolite profiles than with region alone (Fig. 5G), and there was a weaker correlation between fungal and metabolic profiles (Fig. 5F). The stronger bacterium-metabolite correlation may reflect that the Cabernet Sauvignons underwent MLF (a bacterial fermentation) and longer maturation, muting fungal contributions. Hence, wine production methods and wine style may ultimately determine the degree to which different microbial activities contribute to wine chemical composition. The close correspondence between must microbial composition and putative wine metabolite profiles may indicate that the grape microbiota influences the chemical properties of the finished wine and/or that both are strongly shaped by the same regional factors.
Microbial profiles predict abundance of wine metabolites. Wine is a complex chemical and biological matrix, and many of the sensory-active constituents are produced, consumed, or modified by multiple microbial species (15). Relationships between microbial composition and wine metabolite profiles are unlikely to be one dimensional and linear. Thus, models incorporating multiple microbial predictors can be expected to more accurately predict such relationships. We employed random forest (24) classification models to predict wine metabolite composition as a function of grape must microbiome composition. Within the metabolome, a few select putative wine metabolites were predicted relatively well by these models (pseudo-R 2 ϭ 0.50 to 0.98) ( Fig. 6; see Tables S6 and S7 in the supplemental material). Metabolite entity 130.0617@3.1301 (pseudo-R 2 ϭ 0.61), a C 6 ketoacid, is best predicted by the presence of the grape-associated filamentous fungus Cladosporium and Bacillaceae; fermentative yeasts S. cerevisiae and Wickerhamomyces anomalus were also top features in the optimized predictive model (Fig. 6A). Pichia guilliermondii, closely correlated by MFA with entity 114.0702@2.2747 (C 6 H 10 O 2 acid, ester, or lactone), is the principal feature for predicting abundance of this metabolite in Chardonnays (pseudo-R 2 ϭ 0.80) (Fig. 6B). Several other regionally discriminant metabolites were closely linked to microbiota composition (see Tables S6 and S7) Tables S6 and 7). The important sensoryactive medium-chain fatty acid octanoic acid was predicted moderately well in Chardonnay (entity 144.1169@7.3606; pseudo-R 2 ϭ 0.53).
Many of these microbiota-metabolite associations corroborate the well-defined metabolic characteristics of these organisms in pure culture fermentations (15). For the many organisms with unknown roles in wine fermentations, these results raise suggestive associations between grape microbiota and wine chemistry. The grape epiphytes in particular are not well studied for their potential contributions to wine, although they are numerically dominant on grapes and in early fermentations. We recently cultured two of these bacteria, Sphingomonas and Methylobacterium, from finished wines (17), and Enterobacteriaceae, Pseudomonas, Sphingomonas, and Methylobacterium appear to increase in relative abundance during fermentation (not necessarily an indication of growth) in this study and the work of others (13,18,19,25), increasing the probability that these bacteria contribute to wine characteristics directly or indirectly. Plant-associated, nonfermentative, and numerically minor populations could still exert a substantial effect on metabolite profiles directly-e.g., through prefermentation activity or release of metabolites with low sensory thresholds-and indirectly through metabolism or release of small molecules and enzymes following cell death and lysis. Interactions among regional microbiota may also influence the metabolome and deserve further study. For example, release of inhibitory molecules by less fermentative organisms present in grape musts could alter Saccharomyces metabolism, altering wine profiles. However, the DNA profiling techniques as employed here do not differentiate active populations within the microbiota, yielding little insight on such relationships. Metatranscriptomics While these findings cannot claim causation, they demonstrate that the microbial composition of grapes accurately predicts the chemical composition of wines made from these grapes and are therefore biomarkers for predicting wine metabolite composition (a quantifiable feature of terroir). An alternative explanation for the correlation between the prefermentation microbiome and wine metabolome is that both are influenced by the same regional factors (e.g., climate or soil chemistry) and by the metabolite profiles of the grapes. More work must be done to establish the possible roles of these organisms in wine fermentation and flavor production, and sensory studies are necessary to determine whether microbial associations extend to human-perceptible differences in wine traits. We provide a rich exploratory data set that will support mechanistic studies focusing on the roles of these understudied organisms in wine fermentations.
Microbial profiles correlate with juice chemistry and fermentation behavior. Next, we sought to detect correlations between microbiota, must/juice chemistry, and fermentation characteristics, with an emphasis on detecting microbiota that could influence fermentation behavior. Correlations between the microbiota, fermentation rate, and MLF length are particularly important for discovering factors that may contribute to sluggish or stuck fermentations (26).
Numerous significant correlations (FDR-corrected Spearman P value of Ͻ0.05) were detected between initial must/juice composition and microbial abundance in musts and at the end of fermentation (Fig. 7). Notably, fermentation rate was negatively correlated with several taxa, as well as bacterial richness in Chardonnay musts and wines, suggesting that high bacterial diversity inhibits alcoholic fermentation, most likely because greater richness increases the chances of antagonistic species being present. Among the negatively correlated taxa, H. uvarum, Gluconobacter, and Lactobacillus spp. are already known to inhibit fermentation rate through competition for nutrients with S. cerevisiae (26,27), so recovery of these correlations is reassuring. The repeated negative correlation between Enterobacteriaceae and fermentation rate in musts and wines is particularly suggestive of another potential interaction, although the species involved will need to be clarified. (The group detected here represents operational taxonomic units [OTU] identifiable only to the family level.) Erwinia and other Enterobacteriaceae have been observed abundantly in botrytized (17,25) and table wine fermentations (13,18,19), but the nature and role of this group in wines have been unclear. Various Enterobacteriaceae are typically present in and contribute to the flavor of some spontaneous beer fermentations (28) but hinder the fermentation rate in beer (29) and could be similarly problematic in some wine fermentations. Several other organisms were positively correlated with fermentation rate, most notably Pseudomonas, which was significantly correlated in both Cabernet and Chardonnay, both in musts and at the end of fermentation (Fig. 7). The possibility that this bacterium may enhance fermentation rate is an interesting finding worth further investigation.

Conclusions.
Microbial terroir likely involves multiple interacting aspects of microbial distribution, strain diversity, and plant-microbial interactions. The present study explores issues of regional distribution of microbial populations in grapes and wines, building on previous evidence that these patterns exist over larger regions, correlating with climate conditions (10). Not all regions and vineyards are microbiologically unique, and the patterns that distinguish them are not random. Instead, climate and distance between regions are associated with regional microbial patterns (10), and many other factors are likely involved, including processes that are selective (e.g., soil type, topography, human-driven agricultural practices) and neutral (e.g., species dispersal limitation across large distances) (30). Vineyard soil microbiota demonstrate similar regional distribution patterns, associated instead with edaphic factors (31,32), and plant-microbial interactions above and below ground may contribute to plant growth and development, leading to changes in fruit quality (32). Geographic distribution of microbial strains displaying diverse phenotypes appears to be another factor (14). Knight and coworkers (14) found that S. cerevisiae genotypes and phenotypes were correlated with geographic dispersion in New Zealand, and regional strains produced distinct metabolite profiles in experimental wine fermentations. Regional strain diversity may also explain dispersion of wine spoilage traits, such as geographic patterns of histamine decarboxylase genes in lactic acid bacteria in wineries across Bordeaux, France (33). Regional strain diversity in the many other bacterial and fungal species involved in wine production may similarly contribute to microbial terroir and deserves further investigation.
The intricacies of wine flavor are not determined by microbial composition alone. We conjecture that microbial activity contributes to the mixture of abiotic and biotic factors that underlie wine terroir, with the scale of this contribution depending upon the winemaking techniques and style of wine produced. We have demonstrated that the microbial constituents of grape musts are biomarkers for predicting features of wine metabolite composition before fermentation has even commenced. These markers could provide actionable information to winemakers to improve wine characteristics or mitigate problem fermentations-and are unprecedented as early predictors of the wine metabolome. Such information could be practical for predicting the suitability of potential vineyard sites or acquisitions or for preventing microbiological issues in abnormal vintages. We doubt that microbial biomarkers could be used to artificially replicate all aspects of wine terroir, as many other interacting, nonmanipulable factors also contribute (e.g., climate). Terroir is the regional fingerprint of a wine, not solely an engineered feature of winemaking.
Wine is a useful model for testing the theory of microbial terroir, as regional wine qualities are already a well-recognized and celebrated part of wine identity. However, the connection between microbial regionality and food properties is not unique to wine, and these results suggest that similar phenomena likely occur in other food products. Thus, these findings argue for exploring and characterizing the connection between environmental conditions, microbial patterns, and chemico-sensory characteristics in other agricultural products (both food and nonfood) impacted by microbial activities. Microbial terroir may provide further incentive for preserving regional biodiversity through sustainable agricultural practices, in recognition of the economic values provided by regional product identity (14).
Together, these results illustrate a complex relationship between microbial communities in grape musts and wine fermentations with the chemical compositions of the resulting wines. Microbial communities can be distinguished on regional, AVA, and vineyard-specific scales, correlating with multiple environmental parameters (10). The microbial consortium of wine fermentation, influenced by vineyard and winery sources (34), is associated with the chemical composition of the finished wine, suggesting that-if indeed the microbial connection is causative and these changes result in sensory-active effectsmicrobial biogeography is a quantitative, definable feature of wine terroir. We identify numerous associations between the wine microbiome and metabolome, and future studies are necessary to establish FIG 7 Microbiota correlations with must/juice and fermentation characteristics. Shown is the Spearman correlation between must/juice chemistry, fermentation characteristics, and microbiota in musts (left columns) and end of fermentations (EOF [right columns]). Only significant correlations (FDR-corrected P value of Ͻ0.05) are shown. As chemical composition was only measured in musts and juices, no data appear for must/juice correlations at end of fermentation (gray boxes at top right corner). NH3, ammonia concentration; NOPA, total nitrogen by o-phthaldialdehyde assay; YAN, yeast assimilable nitrogen. causative links between the microbial consortia, wine metabolites, and sensory characteristics.

MATERIALS AND METHODS
Sampling and DNA extraction. Samples were collected from Far Niente Winery and Nickel & Nickel Winery, both located in Oakville, Napa County, CA. All samples were collected from the 2011 vintage. These wineries use grapes harvested from throughout Napa County, representing several major viticultural areas ( Fig. 1; see Table S1 in the supplemental material). The primary wine and grape varieties collected were Chardonnay (a dry white wine) and Cabernet Sauvignon (a dry red wine).
Samples consisted of longitudinal wine fermentation samples (n ϭ 777), each corresponding to individual vineyard lots. Samples were collected at five predetermined time points in duplicate. As red and white grapes are processed differently, these times points depended on grape type (Table 1). Red grape fermentations were collected as grape must (destemmed, crushed grapes prior to fermentation), at mid-fermentation, at the end of fermentation following pressing but prior to barreling, at the end of malolactic fermentation (in barrels), and after several months of barrel aging. White grape fermentations were collected as juice, following racking (clarification) prior to inoculation, early fermentation prior to barreling, near the end of fermentation (in barrels), at the end of fermentation (in barrels), and after several months of maturation in barrels.
Samples were frozen immediately, shipped on ice, and stored at Ϫ80°C until processing. Sample processing was performed as described previously (17). Briefly, must samples were thawed and centrifuged at 4,000 ϫ g for 15 min, washed 3 times in ice-cold phosphate-buffered saline (PBS), suspended in 200 l DNeasy lysis buffer (20 mM Tris-Cl [pH 8.0], 2 mM sodium EDTA, 1.2% Triton X-100) supplemented with 40 mg/ml lysozyme, and incubated at 37°C for 30 min. From this point, the extraction proceeded following the protocol of the Qiagen fecal DNA extraction kit protocol (Qiagen, Valencia, CA), with the addition of a bead beater cell lysis step of 2 min at maximum speed using a FastPrep-24 bead beater (MP Bio, Solon, OH). DNA extracts were stored at Ϫ20°C until further analysis.
Sequencing library construction. Amplification and sequencing were performed as described previously for analysis of bacterial (10) and fungal (35) communities. Briefly, the V4 domain of bacterial 16S rRNA genes was amplified using primers F515 (5=-NNNNNNNNGTGTGCCAGCM GCCGCGGTAA-3=) and R806 (5=-GGACTACHVGGGTWTCTAAT-3=) (36), with the forward primer modified to contain a unique 8-nucleotide (nt) bar code (italicized poly-N section of primer above) and 2-nt linker sequence (boldface portion) at the 5= terminus. PCR mixtures contained 5 to 100 ng DNA template, 1ϫ GoTaq Green master mix (Promega), 1 mM MgCl 2 , and 2 pmol of each primer. Reaction conditions consisted of an initial 94°C for 3 min, followed by 35 cycles of 94°C for 45 s, 50°C for 60 s, and 72°C for 90 s, and a final extension of 72°C for 10 min. Fungal internal transcribed spacer 1 (ITS1) loci were amplified with primers BITS (5=-N NNNNNNNCTACCTGCGGARGGATCA-3=) and B58S3 (5=-GAGATC CRTTGYTRAAAGTT-3=) (35), with a unique 8-nt bar code and linker sequence incorporated in each forward primer. PCR mixtures contained 5 to 100 ng DNA template, 1ϫ GoTaq Green master mix (Promega, Madison, WI), 1 mM MgCl 2 , and 2 pmol of each primer. Reaction conditions consisted of an initial 95°C for 2 min, followed by 40 cycles of 95°C for 30 s, 55°C for 30 s, and 72°C for 60 s, and a final extension of 72°C for 5 min. Amplicons were combined into two separate pooled samples (keeping bacterial and fungal amplicons separate) at roughly equal amplification intensity ratios, purified using the Qiaquick spin kit (Qiagen), and submitted to the UC, Davis Genome Center DNA Technologies Core for Illumina paired-end library preparation, cluster generation, and 250-bp paired-end sequencing on an Illumina MiSeq instrument in two separate runs.
Data analysis. Raw Illumina fastq files were demultiplexed, quality filtered, and analyzed using QIIME v1.7.0 (37). Reads were truncated at any site containing Ͼ3 consecutive bases receiving a quality score of Ͻ1eϪ5, and any read containing one or more ambiguous base calls was discarded, as were truncated reads of Ͻ190 nt. Operational taxonomic units (OTU) were assigned using QIIME's uclust-based (38) openreference OTU-picking work flow, with a threshold of 97% pairwise identity. Sequence prefiltering (discarding sequences with Ͻ60% pairwise identity to any reference sequence) and reference-based OTU picking were performed using a representative subset of the greengenes bacterial 16S rRNA database (13_5 release) (39) or the UNITE fungal internal transcribed spacer (ITS) database (9_12 release) (40), filtered to remove incomplete and unannotated taxonomies (35). OTU were classified taxonomically using the RDP classifier (41). Bacterial 16S rRNA gene sequences were aligned using PyNAST (42) against a template alignment of the greengenes core set filtered at 97% similarity. From this alignment, chimeric sequences were identified and removed using ChimeraSlayer (43), and a phylogenic tree was generated from the filtered alignment using FastTree (44). Sequences failing alignment or identified as chimeric were removed prior to downstream analysis. Any OTU representing less than 0.001% of the total filtered sequences was removed to avoid inclusion of erroneous reads, leading to inflated estimates of diversity (45), as were samples represented by less than 500 (bacterial) or 100 (fungal) sequences following all quality-filtering steps.
Beta-diversity (similarity between samples) was calculated within QI-IME using the weighted UniFrac (46) distance between samples (evenly sampled at 1,000 reads per sample) to assess similarity among bacterial communities and Bray-Curtis dissimilarity for fungal communities. Principal coordinates were computed from the resulting distance matrices to compress dimensionality intro three-dimensional principal-coordinate analysis (PCoA) plots, enabling visualization of sample relationships. In order to determine whether sample classifications (AVA, variety, and vineyard) contained differences in phylogenetic diversity, permutational MANOVA (47) with 999 permutations was used to test significant differences between sample groups based on weighted UniFrac (bacterial) or Bray-Curtis (fungal) distance matrices. For all categorical classifications (AVA, variety, and vineyard) rejecting this null hypothesis, Kruskal-Wallis tests were used to determine which taxa differed between sample groups.
All other statistical tests were performed in R software (v 2.15.0). Principal-component analysis (PCA) and multifactorial analysis (MFA) (23) were performed in R with the FactoMineR package (48) to assess regional variations between wine metabolites, must and wine microbiota, and regional affiliations. Only metabolites and taxa demonstrating significant regional differences (ANOVA and Kruskal-Wallis FDR-corrected P value of Ͻ0.05, respectively) were used in MFA and random forest analyses.
Random forest (24) supervised learning models were employed to predict wine metabolite composition as a function of must microbial composition (regression model), to predict region of origin as a function of microbial composition (classification model), and to predict region of origin as a function of metabolite composition (classification). Model predictions were made using the out-of-bag error cross-validation, whereby random samples are removed from the model one by one (with replacement) and used to cross-validate the prediction accuracy of the restrained model. Models were optimized using 100-fold cross-validation to select the minimal number of features (taxa or metabolites) necessary to minimize prediction error, and secondary models were trained using only these features. The resulting feature importance for each feature (metabolite or taxon) describes the relative importance of that feature to that model, quantified as the increase in mean square error when that feature is removed from the prediction model. UHPLC/QTOF MS. A set of 13 Chardonnay and 27 Cabernet Sauvignon wines were selected for metabolite profiling, representing the main growing AVAs and vineyards analyzed in this study in biological replicates. Samples were diluted 1:4 in molecular-grade deionized water analyzed in triplicate in random order. A control was analyzed in quadrupli-cate at the start of the sample set, in triplicate at the end of the sample set, and singly after every tenth sample to monitor for variation in mass accuracy and retention time. Each sample was analyzed in an untargeted datadependent MS/MS mode after the individual sample's triplicate single MS analyses were done, as described below.
Chromatography was performed using an Agilent 1290 ultra-highpressure liquid chromatograph (UHPLC) coupled to an Agilent 6530 quadrupole time of flight mass spectrometer QTOF MS) (Agilent Technologies, Santa Clara, CA). Triplicate 5-l injections were tested for each sample. Chromatographic separation was accomplished using a Zorbax Eclipse Plus C 18 column (5-cm by 2.1-mm inside diameter [i.d.], 1.8-m particle size; Agilent Technologies, Santa Clara, CA). The column heater was set to 60°C throughout the analysis. A reversed-phase gradient was used, with 0.1% acetic acid in water as mobile phase A and 20% mobile phase A-80% HPLC-grade methanol as mobile phase B. From an initial condition of 97% A-3% B, the concentrations changed in a linear gradient from 97% A at 1.00 min to 100% B at 9.00 min, followed by a second linear gradient change from 100% B at 10.00 min to 97% A at 11.00 min, with a total time of 12.00 min for each analysis. The mobile phase flow rate was held at 0.6 ml/min throughout the analysis. An Agilent Jet Stream dualspray electrospray ionization (ESI) source was used in negative mode to focus the liquid chromatography (LC) effluent and separately introduce reference compounds to generate ions for analysis. The system was calibrated using the manufacturer's recommended procedure prior to each analysis run. Mass spectral data were acquired in profile and centroid mode over an m/z range from 100 to 1,700; the transient accumulation rate was 1.41 spectra/s. The ESI source parameters were as follows: drying gas temperature, 350°C; drying gas flow rate, 10 liters/min; nebulizer pressure, 45 lb/in 2 ; sheath gas temperature, 400°C; sheath gas flow rate, 11 liters/min; capillary voltage, 3,000 V; and nozzle voltage, 1,000 V. The QTOF MS settings were fragmentor voltage, 175 V, skimmer voltage, 65 V, and octopole 1 radio frequency (RF) voltage, 750 V. A reference solution containing purine and hexakis(1H,1H,3H-tetrafluoropropoxy)phosphazine (HP-921) was continuously introduced just prior to the ESI source using an isocratic pump into a separate sprayer of the dual Jet Stream source, producing signals of m/z 119.0362 (proton-abstracted purine) and 980.0164 (acetate adduct of HP-921) in negative mode; these signals were used for continuous internal mass calibration throughout the analysis, in order to ensure high mass accuracy for ions detected during the analysis.
MS/MS profiling was conducted in the automated data-dependent MS/MS mode, using a collision energy of 20 eV for all compounds. The Jet Stream ESI source settings were the same as for the QTOF MS experiments. The MS data were collected with an acquisition rate of 3 spectra/s over a mass range of 75 to 1,500 m/z. A maximum of two precursor ions were allowed for each MS scan. The minimum precursor threshold of 200 counts was used, and precursors were selected based on their abundance. Active exclusion was enabled to exclude precursor ions after 2 spectra were collected, so that spectra of other suitable precursor ions could also be collected. Excluded precursor ions were released after 0.1 min, so that spectra of isomers with differing retention times could be collected if present. The MS/MS data were collected with an acquisition rate of 3.0 spectra/s over an MS/MS mass range of 50 to 1,450 m/z, with a medium isolation width of~4 Da: the isolation width was 0.3 Da to the left of the precursor ion and 3.7 Da to the right, which allowed for the inclusion of the isotopes of the precursor ion in the collision cell.
Raw LC-MS data were processed using the Agilent MassHunter Qualitative Analysis software, version 6.00 (Agilent Technologies, Inc., Santa Clara, CA) to mine the data for the presence of nonredundant mass features using isotope peaks, for the presence of adduct ions, to eliminate noise, and to filter those entities found with a minimum abundance (peaks with counts of Ն1 ϫ 10 6 ). Raw MS profiles were processed using the Agilent Mass Profiler Professional (MPP) software, version 12.6 (Agilent Technologies, Inc.), to align mass and retention time data across the samples within the set and to define profiling parameters. Only low-mass entities (Յ300 m/z) were retained for analysis, in order to focus on mass features that are most likely microbial and are most likely aroma active. Mass features displaying inconsistent presence/absence of detection across replicates for any sample were removed, and all remaining mass features were tested using one-way ANOVA with FDR correction, retaining only mass features that differed significantly between AVAs (FDRcorrected P value of Ͻ0.05). PCA and MFA of mass feature data were performed in R, as described above. Putative compound identities were obtained based on matches to the METLIN metabolite database (https:// metlin.scripps.edu). A 30-ppm mass window was used for identification, and putative identifications were based on matching the accurate mass, the retention time, and MS/MS spectra of a given entity to those available in databases. When possible, identifications were confirmed with authentic standards.