Metabolic Roles of Uncultivated Bacterioplankton Lineages in the Northern Gulf of Mexico “Dead Zone”

ABSTRACT Marine regions that have seasonal to long-term low dissolved oxygen (DO) concentrations, sometimes called “dead zones,” are increasing in number and severity around the globe with deleterious effects on ecology and economics. One of the largest of these coastal dead zones occurs on the continental shelf of the northern Gulf of Mexico (nGOM), which results from eutrophication-enhanced bacterioplankton respiration and strong seasonal stratification. Previous research in this dead zone revealed the presence of multiple cosmopolitan bacterioplankton lineages that have eluded cultivation, and thus their metabolic roles in this ecosystem remain unknown. We used a coupled shotgun metagenomic and metatranscriptomic approach to determine the metabolic potential of Marine Group II Euryarchaeota, SAR406, and SAR202. We recovered multiple high-quality, nearly complete genomes from all three groups as well as candidate phyla usually associated with anoxic environments—Parcubacteria (OD1) and Peregrinibacteria. Two additional groups with putative assignments to ACD39 and PAUC34f supplement the metabolic contributions by uncultivated taxa. Our results indicate active metabolism in all groups, including prevalent aerobic respiration, with concurrent expression of genes for nitrate reduction in SAR406 and SAR202, and dissimilatory nitrite reduction to ammonia and sulfur reduction by SAR406. We also report a variety of active heterotrophic carbon processing mechanisms, including degradation of complex carbohydrate compounds by SAR406, SAR202, ACD39, and PAUC34f. Together, these data help constrain the metabolic contributions from uncultivated groups in the nGOM during periods of low DO and suggest roles for these organisms in the breakdown of complex organic matter.

metabolic construction from multiple bins also indicated a complete tricarboxylic acid (TCA) cycle, glycolysis via the pentose phosphate pathway, and gluconeogenesis ( Fig. 1). Carbohydrate active enzyme (CAZy) genes can provide critical information on the relationships between microbes and possible carbon sources (36). We found few CAZy genes, and these were largely restricted to glycosyltransferases (GT) in families 2 and 4, with activities related to cellular synthesis. In general, CAZy expression occurred for at least one gene in every genome, and we detected expression of GT cellular synthesis genes in the E2A sample (Fig. 4), likely indicating actively growing cells.
SAR406 represented more than 5% of the population in some locations during hypoxia in 2013, and one abundant OTU was negatively correlated with DO (21). Metagenomic read recruitment to the SAR406 bins confirmed this trend, with greater recruitment in the suboxic samples relative to dysoxic or oxic samples (Fig. S7). Total RNA recruitment was strongest to Bins 45 and 51-1, though most bins showed an RNA-to-DNA recruitment ratio of Ͼ1 in at least one sample, indicating that these taxa were likely active (Fig. 2). Despite their affinity for low-oxygen environments, the SAR406 genomes encoded a predicted capacity for aerobic respiration (Fig. 1), and we found expression of cytochrome c oxidases in even the lowest oxygen samples (Fig. 3). The group B genomes encoded both high-and low-affinity cytochrome c oxidases (37), whereas the high-affinity (cbb 3 -type) oxidases were not recovered in the group A genomes (see Table S1 at http://thethrashlab.com/publications), which may indicate sublineage-specific optimization for different oxygen regimes.
Sublineage variation also appeared in genes for the nitrogen and sulfur cycles. Group B genomes all contained predicted nitrous oxide reductases (nosZ) and nrfAH genes for dissimilatory nitrite reduction to ammonium (defined here as DNRA, although this acronym frequently refers to nitrate, even though that is a misnomer [38]). The nrfA genes formed a monophyletic group with Anaeromyxobacter dehalogenans 2CP-1, an organism with demonstrated DNRA activity (38) (Fig. S8A). The genes also contained conserved motifs diagnostic of the nrfA gene (38) (Fig. S8B and C). We observed expression of nrfAH and nosZ at the sites with the lowest DO concentrations (D2, E2A, and E4), and expression appeared to have a negative relationship with DO concentration (Fig. 3). The Bin 51-1 group A genome contained predicted narHI genes for dissimilatory nitrate reduction, which we did not find in the group B genomes. We observed expression of SAR406 narHI only in the lowest DO sample from station E2A (Fig. 3). Two group B SAR406 genomes had predicted phsA genes for thiosulfate reduction to sulfide (and/or polysulfide reduction [39]), as previously described from fosmid sequences (27). We detected transcripts for these genes only in samples E2A and E4, the two lowest DO samples (Fig. 3). Many of the anaerobic respiratory genes were coexpressed with cytochrome c oxidases, indicating a potential for either coreduction of these alternative terminal electron acceptors or poising of these organisms for rapid switching between aerobic and anaerobic metabolism (40). All SAR406 genomes had numerous genes for heterotrophy. We found CAZy genes in all major categories except polysaccharide lyases, and expression for most of these genes in both group A and group B genomes in one or more samples (Fig. 4). Notable carbohydrate compounds for which degradation capacity was predicted include cellulose (glycoside hydrolase [GH] families GH3 and GH5; carbohydrate binding module [CBM] family CBM6), starch (GH13), agar and other sulfated galactans (GH2 and GH16), chitin (GH18), xylan (GH30 and CBM9), and peptidoglycan (GH23, GH103, and CBM50). The genomes contained putative transporters for a variety of dissolved organic matter (DOM) components including nucleosides, amino and fatty acids, and oligopeptides (see Table S1 at http://thethrashlab.com/publications). We also found numerous outer membrane transporters (including cation symporters), outer membrane receptors (OMR) (TonB dependent) (which play important roles in transport of metals, vitamins, colicins, and other compounds), and outer membrane factors (OMF). Most genomes also had large numbers of duplicated genes (24 in Bin 45-2), identified via hidden Markov model searches against the Sifted Families (SFam) database (41), annotated as "Por secretion system C-terminal sorting domain-containing protein," some of which were associated with GH16. These genes likely play a role in sorting C-terminal tags of proteins targeted for secretion via the Por system, which is essential for gliding motility and chitinase secretion in some Bacteroidetes (42). The extensive gene duplication may indicate expanded and/or specialized sorting functionality and suggests an emphasis on protein secretion in this group. Expression of a membrane-bound lytic murein transglycosylase D (GH23) involved in membrane remodeling also supports the idea of active and growing cells from group A in all samples (see Table S1 at http://thethrashlab.com/publications).
We detected Chloroflexi 16S rRNA gene sequences during 2013 hypoxia at up to 5% of the community (21) and recovered three mostly complete SAR202 Chloroflexi genomes in this work. Although present at a lower abundance than MGII and SAR406 FIG 2 Relative DNA to RNA recruitment rank for each genome, by sample. Colors indicate the relative difference in the ratio of rank based on total RNA and DNA mapping. Red indicates a higher RNA recruitment rank compared to DNA recruitment rank, and vice versa for blue. The plus and minus symbols indicate bins where the rank residual from the identity in RNA versus DNA read mapping was more or less than 1 standard deviation beyond 0, respectively. Dendrograms were calculated using an unweighted-pair group method with arithmetic mean (UPGMA) from Euclidean distances of rank residuals across all samples and bins. (Continued on next page) Thrash et al.
( Fig. S7), these genomes showed relatively high activity in some samples (Fig. 2). Like subclade III and V, subclade I organisms likely respire oxygen. However, we also found napAB and nosZ genes for nitrate and nitrous oxide reduction, respectively (Fig. 1). As in SAR406, we detected concurrent expression of these genes with cytochrome c oxidases in the lowest DO samples (Fig. 3) (see Table S1 at http://thethrashlab.com/ publications). The SAR202 genomes have numerous transporters, many with predicted roles in organic matter transport, which supports previous observations of DOM uptake (43). In particular, SAR202 genomes had considerably more major facilitator superfamily (MFS) transporters than the other genomes in this study (see Table S1 at http://thethrashlab .com/publications) and those of the subclade III genomes (25), and SFam searches revealed that the majority of these shared annotation as a "Predicted arabinose efflux permease" (SFam 346742). MFS genes transport numerous diverse substrates, such as sugars and amino acids, through coupling with an ion gradient, and can be associated with either uptake or export of compounds (44). SAR202 genomes also had between 53 and 66 predicted ABC transporters.
The SAR202 genomes carried a number of duplicated genes in specific gene families. The largest gene family expansion that we observed was associated with SFam 6706, with between 46 and 48 genes in this family in each genome. Most of these genes (121/142) were annotated as either a "galactonate dehydratase" or a "L-alanine-DLglutamate epimerase." Galactonate dehydratase catalyzes the first step of the pathway to utilize D-galactonate in central carbon metabolism via the pentose phosphate pathway. The large number of genes in these categories likely indicates some divergence for alternative roles, as this group belongs broadly to the COG4948 "L-alanine-DL-glutamate epimerase or related enzyme of enolase superfamily." All genomes also had numerous dehydrogenase genes as reported for the subclade III genomes (25). Specifically, SFams 346640 and 1639 were the third and fourth most abundant, with 16 to 18 and 13 to 15 genes in each family, respectively, in the three genomes. Genes in these families were annotated as "short-chain alcohol dehydrogenase family," "3-alpha (or 20-beta)-hydroxysteroid dehydrogenase," "meso-butanediol dehydrogenase," and others. These match the annotations of the subclade III genomes and suggest a similar role in conversion of alcohols to ketones (25). The SAR202 genomes have comparatively few CAZy genes relative to the other genomes. GH15 and GH63 suggest starch degradation and GH105 pectin degradation, and we detected expression of multiple genes in these categories across samples ( Fig. 4; see Table S1 at http://thethrashlab .com/publications).
Other candidate phylum organisms in nGOM hypoxia. In contrast to the abundant and cosmopolitan MGII, SAR406, and SAR202 clades, we also recovered genomes from several groups that were either previously undetected in the nGOM or very rare. Although these taxa likely do not contribute the biomass of more populous clades, their genomes provide important insight into their functional potential during hypoxia. The Bin 13 genome (possibly ACD39) also had the highest relative activity compared to all the other genomes in our study (Fig. 2), underlining the point that low abundance does not automatically equate to low metabolic impact. Bin 13 had predicted aerobic respiration with both high-and low-affinity cytochrome c oxidases (Fig. 5). The lowaffinity oxidases contributed more reads in the samples where we could detect expression (see Table S1 at http://thethrashlab.com/publications). The genome contained numerous predicted CAZy genes in the glycosyltransferase and glycoside hydrolase categories, spread across multiple families in each (see Table S1 at http:// thethrashlab.com/publications). Notable degradation capacity included starch (GH13) and peptidoglycan (GH23, GH103, and GH104).
Bin 13 had ϳ80 ABC transporter genes, and similar to the SAR406 genomes, numerous outer membrane transporters, including the OMR and OMF families. We predict complete glycolysis/gluconeogenesis pathways and a TCA cycle. We recovered paralogous pilus subunit genes, chemotaxis genes, and a partial flagellar assembly. Furthermore, we detected relatively high expression of the flgLN flagellin genes in samples D2, D3, E2A, and E4 (see Table S1 at http://thethrashlab.com/publications), suggesting active motility in these environments. Several other Bin 13 genes were among the most highly expressed in all samples but could be classified only as hypothetical (see Table S1 at http://thethrashlab.com/publications). Similarly, the three most populous SFams in Bin 13, according to the numbers of genes (n ϭ 16, 15, and 13) also linked to genes annotated as hypothetical proteins with either tetratricopeptide, HEAT, TPR, or Sel1 repeats. Although currently obscure, these and the highly expressed hypothetical genes represent important targets for future research into the function of this group.
Bins 50 and 48 were lower in abundance than SAR202 genomes (Fig. S7, PAUC34f), with no observable trend associated with oxygen levels (Fig. S7). These genomes encoded flagellar motility, aerobic respiration, glycolysis via the pentose-phosphate pathway, gluconeogenesis, assimilatory sulfate reduction, and DNRA (Fig. 5). The nrfA subunit from both genomes grouped in the same monophyletic clade as those from SAR406 ( Fig. S8A) and had similar conserved motifs ( Fig. S8B and C). However, we note . Colors indicate pathway elements based on the number of genomes in which they were recovered, according to the key. Black outlines and/or arrows indicate genes that were not observed. Boldface black numbers correspond to annotations supplied in Table S1 at http://thethrashlab.com/publications. that the nrfAH gene sets for Bins 50 and 48 occurred on relatively short contigs (5,650 and 5,890 bp, respectively), so the metabolic assignment cannot be corroborated as definitively as that for SAR406. The Bin 50 genome was among the more active in our analysis (Fig. 2), and we detected the highest expression of cytochrome c oxidase components in samples E2A and E4 (Fig. 3). DNRA gene expression was low but observable in the same samples. We also recovered a partial gene for the ribulosebisphosphate carboxylase (RuBisCO) large subunit, but this fragment was on a very short contig (3,954 bp), and we did not detect expression in any of our samples, so we cannot rule out that this gene occurred on a contaminating contig.
The Bin 50 and 48 genomes had abundant CAZy genes in all categories, suggesting a highly flexible metabolic repertoire for carbon acquisition. They contain possible capacity for breakdown of starch (GH13 and CBM48), peptidoglycan (GH23 and CBM50), fructose-based oligosaccharides (GH32), and hemicellulose (GH2, GH3, and GH43). Notably, these genomes were the only ones with predicted polysaccharide lyases (PL) among those compared (with the exception of a single predicted PL gene in SAR406 [see Table S1 at http://thethrashlab.com/publications]). PL genes cleave uronic-acid containing polysaccharides (45). These organisms seem particularly adapted for pectin (PL1, PL2, PL9, PL10, PL11, PL22, and GH78) and alginate (PL15 and PL17) degradation-both compounds are common cell wall components of green and brown algae, respectively.
In line with the algal cell wall degradation ability, we detected a large expansion (102 genes in Bin 50) of sulfatase genes in SFam 1534, annotated predominantly as either "arylsulfatase A" or "choline-sulfatase." Arylsulfatases cleave sulfate esters, usually to supply microbes with a source of sulfur, and can be located intracellularly or in membranes (46). Choline sulfatases cleave choline sulfate to choline and sulfate, with downstream use for the former as a carbon source or osmoprotectant and the latter as a sulfur source (47). Given the predicted assimilatory sulfate reduction pathway in Bins 50 and 48, this is a logical means to obtain sulfur for the group. We observed large expansions in galactonate and other dehydratases (as in SAR202, above; SFam 6706, 42 genes in Bin 50), as well as numerous ABC transporter permeases (SFam 4442), which match the transporter predictions via IMG (Integrated Microbial Genomes): 117 predicted genes for ABC transporters in all. These genomes also had numerous OMF and OMR transporter genes (see Table S1 at http://thethrashlab.com/publications). The large number of transporters and protein family expansions correspond to the relatively large expected genome sizes (between 5 and 6 Mbp).
We also recovered genomes associated with CPR taxa usually associated with anoxic environments: two Peregrinibacteria and one from the Uhrbacteria subclade of the Parcubacteria (formerly OD1). All three genomes could be assigned taxonomically with high confidence based on their positions in the ribosomal protein tree (Fig. S1) and via gene annotations (Fig. S3). We note that although the Peregrinibacteria bins (16 and 39) had very low predicted contamination, the Parcubacteria Bin 40 had 15% predicted contamination (75% of which we attributed to strain heterogeneity in the bin) ( Table 1). Recovery of Parcubacteria from a coastal marine system is unusual, but not unprecedented. Parcubacteria single-cell genomes have been identified in marine and brackish sources (23), and we previously identified 26 rare OTUs assigned to the phylum in nGOM hypoxia (21). That number of OTUs may explain why we observed 20 single-copy marker genes present in two copies in Bin 40 (see Table S1 at http://thethrashlab.com/ publications).
In contrast to Parcubacteria, Peregrinibacteria have thus far been found only in terrestrial subsurface aquifers (31,33,48,49) and remained undetected in our amplicon survey (21). Both groups occurred in low relative abundance compared to the other taxa in this study (Fig. S7) and showed the lowest activity (Fig. 2). Consistent with previous reports of obligate fermentative metabolism by Parcubacteria and Peregrinibacteria (23,30,31,48), we identified no respiratory pathways for these taxa (Fig. 5), and they trended toward greater abundances in the lowest DO samples (Fig. S7). In spite of relatively high predicted genome completion, we found very few CAZy genes, and those genes were mostly restricted to glycosyltransferases (see Table S1 at http:// thethrashlab.com/publications) probably involved in capsular polysaccharide synthesis. While these organisms had low relative abundance to the other groups (Fig. S7), we did observe activity in some samples (Fig. 2, samples E2, E2A, and D1).

DISCUSSION
This work provides the first reconstruction of multiple nearly complete genomes from uncultivated bacterioplankton during nGOM hypoxia. Although we define roles for MGII, SAR406, SAR202, Bin 13, and Bins 50/48 as aerobic heterotrophs, we also observed concurrent expression of genes associated with anaerobic metabolism in SAR406 (nitrate reduction, DNRA, nitrous oxide reduction, and sulfur reduction), SAR202 (nitrate and nitrous oxide reduction), MGII (nitrite reduction), and Bins 50/48 (DNRA) in suboxic samples with the lowest measured DO concentrations. Simultaneous utilization of multiple electron acceptors with different redox potentials likely indicates an abundant supply of electron donors (50), may denote niche partitioning within group sublineages at a finer level of taxonomic resolution than we observed, or indicates poising of taxa for rapidly changing chemical gradients (40). An organism's set of CAZy genes often gives insights into its biology, in particular into nutrient sensing and acquisition. All taxa examined in this study had predicted chemoorganoheterotrophic metabolism, and the CAZy genes found in these genomes suggest that SAR406, SAR202, Bin 13, and Bins 50/48 participate in the degradation of complex organic matter resulting from the detritus of larger organisms. This matches the general model of hypoxic zone oxygen consumption resulting from sinking organic matter provided by algal blooms in surface waters (1). The observed activity of obligate fermentative groups Parcubacteria and Peregrinibacteria also suggests that anoxic pockets occur in the water column where these organisms can thrive.
Marine Group II (MGII) is a broadly distributed archaeal clade, with members found in different marine (51,52) and sedimentary (53) environments. Previous work during 2012 and 2013 hypoxia indicated a proliferation of archaeal taxa in both the Thaumarchaea and MGII phyla (21,22). The prevalence of MGII among lower oxygen samples in the hypoxic zone is somewhat surprising, considering that they are commonly associated with aerobic environments (52). However, oxygen was still present in even the lowest DO samples (Fig. 2), and MGII success likely had more to do with the carbon content than oxygen levels. These nGOM MGII appear to be metabolically similar to those described in previous work: MGII have been shown to be dominant in water column environments associated with blooms in productivity, for example at deep-sea hydrothermal plumes (51). Thus, the increased availability of organic matter (proteins and carbohydrates), thought to be preferred substrates for MGII (54,55), probably explains their abundance.
Another cosmopolitan group found in our samples was SAR406 or Marine Group A. These organisms were discovered more than 20 years ago (28,56), and the clade has recently been proposed as the phylum "Marinimicrobia" (23). SAR406 occur in numerous marine (5,23,26,28,57), sedimentary (23), and even oil reservoir (58) environments. They are prevalent in deeper ocean waters (28,57,59) and prefer lower oxygen concentrations in OMZs (5,26,60). Our genomes had larger estimated genome sizes-2.6 to 2.7 Mbp (group A) and 2.8 to 3.5 Mbp (group B)-compared to 1.1 to 2.4 Mbp from single-cell genomes (23). Overall GC content, however, was in the range of the 30 to 48% reported for fosmids (27) and single-cell genomes (23). The lower-GC group A genomes specifically had GC contents similar to that of the Arctic96B-7 fosmids, matching their predicted phylogenetic affiliation (see below) (27).
Our data now also define roles for SAR406 in the eutrophication-driven hypoxia of the nGOM. Previous metabolic reconstructions of SAR406 predicted aerobic metabolism (23) and sulfur reduction (27), which our data confirm, although the sulfur reduction genes were found only in group B organisms (see Table S1 at http:// thethrashlab.com/publications). Our genomes also suggest multiple nitrogen cycling roles that appear to be organized by sublineages within the phylum, and sublineage-specific presence of both high-and low-affinity cytochrome c oxidases. The group B organisms group with the early diverging SBH1141 clade (27) for which no previous genome data exist. Group B organisms contained both types of cytochrome c oxidase genes, nosZ and nrfAH genes, whereas group A organisms, sister to the Arctic96B-7 clade, contained only the low-affinity cytochrome c oxidases, and additionally narHI genes not found in group B. The unique roles predicted for these taxa are not surprising given the diversity of the SAR406 clade and the genetic distances between group A and B (see Fig. S4 in the supplemental material). The fosmids associated with the Arctic96B-7 clade contained genes for oxidative stress and sulfur reduction (27), although we only found sulfur reduction genes in the distantly related group B genomes. The Arctic96B-7 clade may be diverse enough to encompass differing metabolic strategies, but the variable presence of phsA genes in this group may simply be due to incomplete genomic data. In addition to sublineage-specific respiratory characteristics, our results also generate specific hypotheses about organic matter metabolism in SAR406: likely degradation capacity for cellulose, starch, agar, xylan, and peptidoglycan; transport of nucleosides, amino and fatty acids, and oligopeptides; and substantial gene duplication associated with protein secretion for possible extracellular metabolism. Together, these data suggest that during nGOM hypoxia, SAR406 members degrade complex carbohydrates fueled by aerobic respiration and supplemented with facultative anaerobic respiration of nitrate, nitrite, or sulfur compounds.
Members of the SAR202 clade of Chloroflexi also inhabit a wide variety of marine environments (24), frequently in deeper waters (24,43,57,59,61) and remain functionally understudied because genome data for SAR202 have been lacking. Landry and colleagues recently described the properties for several single-cell genomes representing SAR202 subclades III and V recovered from the mesopelagic zone (25). Our genomes have generally higher GC content and much lower expected genome sizes than those predicted by Landry et al. (25), although these calculations are likely complicated by the relative incompleteness of their genomes (8 to 56%). The Landry et al. genomes indicated a role for SAR202 in the oxidation of recalcitrant dissolved organic matter, and specifically cyclic alkanes, via flavin mononucleotide monooxygenases (FMNOs) and different dehydrogenases that occurred in paralogous groups (25). We observed many of the same gene expansions, namely that of MFS transporters and short-chain dehydrogenases (and related genes), but we did not recover any FMNOs of SFam 4832 or 4965, suggesting subclade-and/or niche-specific adaptations. Furthermore, we observed napAB and nosZ genes for nitrate and nitrous oxide reduction (and expression of these genes), which were not reported for subclade III or V. Our nGOM hypoxia SAR202 genomes had CAZy genes implicating them in degradation of complex compounds such as chitin and pectin. The emerging picture of these taxa from both shallow hypoxic waters and the mesopelagic zone is one of recalcitrant carbon degraders, with overlapping suites of paralogous genes, but that may be specialized for specific compounds more commonly available in their respective habitats.
This study has also developed roles for CP taxa in a shallow marine water column during hypoxia. The most active organism in our survey based on the ratio of RNA to DNA reads recruited, Bin 13, putatively belongs to a group with little genomic data-ACD39. The original ACD39 genome was reconstructed from an aquifer community (33). Although this was only a partial genome, it shared some features with our putative ACD39 member, namely pilin and chemotaxis genes, those containing TPR and tetratricopeptide repeats, and CAZy genes for degradation of complex compounds such as starch (33). Our study provides evidence that these taxa have relatively large genomes (ϳ4.8 Mbp), are active aerobes in nGOM hypoxia, and have chemotaxis and motility genes that could facilitate scavenging and surface attachment. However, most of the highly expressed genes in this organism were annotated as hypothetical proteins, so much of the function of these organisms remains to be uncovered.
Bins 50 and 48 provide novel genome data for bacterioplankton in nGOM hypoxia, although the exact taxonomic position of these bins remains in conflict. The ribosomal protein tree provides evidence that these taxa belong to the Latescibacteria (WS3) (Fig. S1), but 16S rRNA genes (Fig. S6) and our amplicon data point toward membership in the more poorly understood PAUC34f clade. Since no previous genome data exist for PAUC34f, we cannot rule out erroneous assignment in the ribosomal protein tree due to insufficient taxon selection. Bins 50 and 48 represented the largest genomes of the study, with estimated complete sizes of ϳ5 to 6 Mbp, and numerous genes suggesting degradation of a wider suite of complex organic matter than any of the other genomes examined. For example, they were the only genomes with numerous polysaccharide lyase genes, and these likely facilitate breakdown of algal cell wall components like pectin and alginate. The Bin 50 genome was among the most active across all samples (Fig. 2), and we detected expression of cytochrome c oxidase genes, and those for DNRA, in both the Bin 50 and 48 genomes. Thus, we expect these organisms to have an aerobic, potentially facultatively anaerobic, multifaceted chemoorganoheterotrophic metabolism with roles in complex carbon compound degradation (like that of algal cell walls) and the nitrogen cycle.
If these bins belong to PAUC34f, they represent the first genomic data for the group. Although originally discovered, and commonly found, in marine sponges (32,(62)(63)(64), this putative bacterial phylum (via GreenGenes/SILVA) has been detected as a rare group in other marine invertebrates (65) and stream sediment (66), and we identified 18 distinct but rare PAUC34f OTUs in nGOM hypoxia compared to just 3 from WS3 (21). Although the majority of studies suggest an endosymbiotic lifestyle for PAUC34f, our representative genome data point toward a free-living existence with multiple terminal electron-accepting processes, motility genes for seeking more favorable conditions, and a large metabolic repertoire for degradation of complex compounds. On the other hand, if these genomes represent WS3, the sister clade to PAUC34f (Fig. S6), they have many similarities to the lifestyles inferred from recent metagenomic investigations (67,68). Specifically, while this group was previously considered anaerobic (67), new data have supported an aerobic lifestyle for some members (48) and revealed complete electron transport chains and both high-and low-affinity cytochrome c oxidases (68). The Bin 50 and 48 genomes predict aerobic metabolism as well, although only with low-affinity cytochrome c oxidases. Farag et al. also found little evidence of these taxa in host-associated environments, contrary to PAUC34f sequence data (68). The enrichment of PL family genes in Bins 50 and 48, polysaccharide degradation capability in general, and specific genes for degradation of cell wall components, all corroborate previous findings on WS3 as well (68). Bin 50 had 78 annotated peptidases, nearly double that in all other genomes in the study (Bin 48 had 46), which also concurs with metagenomic predictions for WS3 (68). Our genomes differed from WS3 metagenomes principally in the predicted DNRA metabolism and the dramatic expansion of sulfatases. Although sulfatases were observed in WS3 metagenomes (68), they were not present in the numbers associated with Bin 50 (n ϭ 102). A large cadre of sulfatases has been previously reported for Lentisphaera (n ϭ 267) and Pirellula (n ϭ 110) genomes (69,70) and suggests specialization for degradation of sulfate esters to satisfy carbon and/or sulfur requirements.
Although Parcubacteria and Peregrinibacteria occurred in low abundance (Fig. S7) and we detected activity in only a few samples, their recovery in the hypoxic zone is notable because these organisms have generally been associated with anoxic environments. Our predicted genome sizes (ϳ1.5 Mbp) corroborate previous reports of these organisms having small genomes (31,48). We did not observe any genes associated with nitrogen or sulfur redox transitions, although we cannot rule out these capabilities entirely due to incomplete genomes. Regardless, we can hypothesize that Parcubacteria and Peregrinibacteria persist as members of the rare biosphere until they can take advantage of microanoxic niches in the water column where they participate in carbon cycling as obligately fermentative organisms.
Excluding Parcubacteria and Peregrinibacteria, the other uncultivated groups in the nGOM hypoxic zone had one or more genomes that encoded cytochrome c oxidases (and other electron transport chain components) for respiring oxygen, making these taxa likely only facultative anaerobes. Pervasive aerobic metabolism in an oxygen-depleted water column may seem counterintuitive, yet despite DO being as low as 2.2 mol · kg Ϫ1 in the E2A sample, oxygen probably remained high enough to sustain aerobic microbes. As little as ϳ0.3 mol · kg Ϫ1 oxygen inhibited denitrification in OMZ populations by 50% (71), and even Escherichia coli K-12 could grow aerobically at oxygen concentrations as low as 3 nM (72). Thus, for many organisms, active aerobic respiration likely persists even in suboxic waters during nGOM hypoxia.
Nevertheless, our data also suggest pervasive coreduction of alternative terminal electron acceptors (oxygen, nitrate, nitrite, nitrous oxide, and sulfur), sometimes within the same organism (Fig. 3). Coreduction of electron acceptors with different redox potentials across a community could indicate microniches and/or aggregates in the water column where DO concentrations drop below bulk values (40). Alternatively, this can occur with an abundance of electron donor, and overlapping redox processes have been reported in multiple environments, including aquatic ones (50,73). Concurrent expression of genes for multiple terminal electron-accepting processes within a single organism has been proposed as a means of improved readiness for dynamic conditions, albeit at the cost of lower productivity (40). Given that many uncultivated taxa likely perform multiple terminal electron-accepting processes (and possibly do so simultaneously) and we found a comparative cornucopia of genes for degradation of chemoorganoheterotrophic energy sources, we hypothesize that niche differentiation within uncultivated hypoxic zone bacterioplankton occurs predominantly via specialization for different oxidizable substrates rather than for distinct roles in the canonical redox cascade (4,5).
Importantly, many of the active uncultivated taxa also appeared adapted for degradation of complex carbon substrates. Such compounds might comprise the bulk of available organic matter during the later stages of hypoxia after initial oxygen depletion by microorganisms feeding on more labile carbon sources. Selection for chemoorganotrophic microbes adapted to utilize recalcitrant organic matter could also explain why organisms that do not require an exogenous carbon source, such as the chemolithoautotrophic Nitrosopumilus, proliferate during hypoxia (21,22) compared to their levels during spring before DO decreases (74,75). Temporal data on the relative abundance and activity of these nGOM microbial dark matter organisms, and of organic matter composition in the water column, will be critical to more fully understand the relationship of bacterioplankton to the creation, maintenance, and dissolution of nGOM hypoxia.

MATERIALS AND METHODS
Sample selection and nucleic acid processing. Six samples representing hypoxic (n ϭ 4) and oxic (n ϭ 2) dissolved oxygen (DO) concentrations were picked from among those previously reported (21) at stations D1, D2, D3, E2, E2A, and E4 (see Table S1 at http://thethrashlab.com/publications). DO and nutrient collection information is detailed in the study by Gillies et al. (21). Nucleic acids were collected as follows. At these six stations, 10 liters of seawater was collected and filtered with a peristaltic pump. A 2.7 m Whatman GF/D prefilter was used, and samples were concentrated on 0.22 m Sterivex filters (EMD Millipore). Sterivex filters were immediately sparged, filled with RNAlater, and placed at Ϫ20°C, where they were maintained until extraction. DNA and RNA were extracted directly off the filter by placing half of the Sterivex filter in a Lysing matrix E (LME) glass/zirconia/silica beads tube (MP Biomedicals, Santa Ana, CA) using the protocol described by Gillies et al. (21) which combines phenol: chloroform:isoamyl alcohol (25:24:1) and bead beating. Genomic DNA and RNA were stored at Ϫ80°C until purified. DNA and RNA were purified using a Qiagen (Valencia, CA) AllPrep DNA/RNA kit. DNA quantity was determined using a Qubit2.0 fluorometer (Life Technologies, Grand Island, NY). RNA with an RNA integrity number (RIN) (16S/23S rRNA ratio determined with the Agilent TapeStation) of Ն8 (on a scale of 1 to 10, with 1 being degraded and 10 being undegraded RNA) was selected for metatranscriptomic sequencing. Using a Ribo-Zero kit (Illumina), rRNA was subtracted from total RNA. Subsequently, mRNA was reverse transcribed to cDNA as described by Mason et al. (76).
Sequencing, assembly, and binning. DNA and RNA were sequenced separately, six samples per lane, with Illumina HiSeq 2000 chemistry to generate 100 bp, paired-end reads (180 bp insert size) at the Argonne National Laboratory Next Generation Sequencing Facility. The data are available at the NCBI SRA repository under the BioSample accession numbers SAMN05791315 to SAMN05791320 (DNA) and SAMN05791321 to SAMN05791326 (RNA). DNA sequencing resulted in a total of 416,924,120 reads that were quality trimmed to 413,094,662 reads after adaptors were removed using Scythe (https://github .com/vsbuffalo/scythe), and low-quality reads (Q Ͻ 30) were trimmed with Sickle (https://github.com/ najoshi/sickle). Reads with three or more N's or with an average quality score of less than Q20 and a length of Ͻ50 bp were removed. Metagenomic reads from all six samples were pooled, assembled, and binned using previously described methods (77,78). Briefly, quality-filtered reads were assembled with IDBA-UD (79) on a 1TB RAM, 40-core node at the Louisiana State University high-performance computing Decoding Microbes of the Dead Zone ® cluster SuperMikeII, using the following settings: -mink 65 -maxk 105 -step 10 -pre_correction -seed_kmer 55. Initial binning of the assembled fragments was performed using tetranucleotide frequency signatures using 5 kbp fragments of the contigs. Emergent self-organizing maps (ESOM) were manually delineated and curated based on clusters within the map. The primary assembly utilized all reads and produced 28,080 contigs Ն3 kbp totaling 217,715,956 bp. Of these, 303 contigs were over 50 kbp, 72 were over 100 kbp, and the largest contig was just under 495 kbp. Binning produced 76 genomes, of which 20 genomes were assigned to lineages with uncultivated representatives using CheckM, ribosomal protein trees, and 16S rRNA gene sequences (below).
DNA and RNA mapping. Metagenomic and metatranscriptomic sequencing reads from each sample were separately mapped to binned contigs using BWA (80) to compare bin abundance across samples and facilitate bin cleanup (below). Contigs within each bin were concatenated into a single fasta sequence, and BWA was used to map the reads from each sample to all bins. All commands used for these steps are available in the supplemental information at http://thethrashlab.com/publications. Bin quality control. Bins were examined for contamination and completeness with CheckM (35), and we attempted to clean bins with Ͼ10% estimated contamination using a combination of methods. First, the CheckM modify command removed contigs determined to be outliers by GC content, coding density, and tetranucleotide frequency. Next, in bins that still showed Ͼ10% contamination, contigs were separated according to the comparative relative abundance of mean DNA read coverage by sample. Final bins were evaluated with CheckM again to generate the statistics in Table S1 at http://thethrashlab.com/ publications and final bin placements in the CheckM concatenated gene tree (Fig. S2).
Ribosomal protein tree. The concatenated ribosomal protein tree was generated using 16 syntenic genes that have been shown to undergo limited lateral gene transfer (rpL2, 3, 4, 5, 6, 14, 15, 16, 18, 22, and 24 and rpS3, 8, 10, 17, and 19) (81). Ribosomal proteins for each bin were identified with Phylosift (82). Amino acid alignments of the individual ribosomal proteins were generated using MUSCLE (83) and trimmed using BMGE (84) (with the following settings: -m BLOSUM30 -g 0.5). The curated alignments were then concatenated for phylogenetic analyses, and phylogeny was inferred via RAxML v 8.2.8 (85) with 100 bootstrap runs (with the following settings: mpirun -np 4 -npernode 1 raxmlHPC-HYBRID-AVX -f a -m PROTCATLG -T 16 -p 12345 -x 12345 -# 100). Note that this is similar to the number utilized in a previous publication for this tree with automated bootstrapping (34), and required just over 56 h of wall clock time. The alignment is available in the supplemental information at http://thethrashlab.com/ publications.
Taxonomic assignment. Taxonomy for each bin was assigned primarily using the ribosomal protein tree. However, for bins that did not have enough ribosomal proteins to be included in the tree, or for bins for which the placement within the tree was poorly supported, assignments were made based on the concatenated marker gene tree as part of the CheckM analysis (Fig. S2) or via 16S rRNA gene sequences, when available. 16S rRNA genes were identified via CheckM, and these sequences were aligned against the NCBI nr database using BLASTN to corroborate CheckM assignments. In the case of the SAR202 genomes, which did not have representative genomes in either the ribosomal protein tree or the CheckM tree, the 16S rRNA gene sequences for two of the three bins (43-1 and 43-2) were available and aligned with the sequences used to define the SAR202 clade (24) (Fig. S5). Alignment, culling, and inference were completed with MUSCLE (83), Gblocks (87), and FastTree2 (88), respectively, with the FT_pipe script. The script is provided in the supplemental information at http://thethrashlab.com/publications. The 16S rRNA gene tree for subclade assignment of SAR406 (Fig. S4) was assembled by subjecting the four 16S sequences predicted by CheckM to a BLAST search against a local GenBank nucleotide database using blastn (v. 2.2.28ϩ) (89), selecting the top 100 nonredundant hits to each sequence, and manually removing all hits to genome sequences. These sequences were combined with previously defined SAR406 subclade reference sequences (26), fosmid 16S sequences (27), single-cell genome sequences (23), and run through alignment, culling, and inference with FT_pipe. Taxa with identical alignments were removed with RAxML v 8.2.8 (85) using default settings, and the final tree was inferred using FastTree2 (88). For putative candidate phylum (CP) genomes, taxonomy was also evaluated by examining the taxonomic identification for each of the predicted protein sequences after a BLASTP search against the NCBI nr database. After the BLAST search, the number of assignments to the dominant one or two taxonomic names, along with the number of assignments to "uncultured bacterium," was plotted for each genome according to the bit score quartile (Fig. S3). Quartiles were determined in R using the summary function. Bin 56 has two ribosomal protein operons on scaffold_2719/Ga0113622_1153 and scaffold_21777/Ga0113622_1009. In the ribosomal protein tree, the former placed the organism in the Planctomycetaceae, while the latter (which was much smaller) placed the organism in CP WS3. The majority of BLASTP annotations to the nr database matched Planctomycetaceae taxa, as did the 16S rRNA gene sequences found in the genome, so the Bin 56 organism was designated a Planctomycetes and not WS3 and excluded from this study. The 16S rRNA gene from Bin 50 was also used to infer taxonomic identity using an established phylogeny for the WS3 clade (68) and relevant outgroups. The Bin 50 sequence was subjected to a BLAST search against the GreenGenes database (December 2013) with megablast, and since many of the top hits belonged to the PAUC34f clade, these were included with the sequences from Farag et al. (68). Alignment, culling, and inference were completed with FT_pipe. Node labels were constructed with the newick utilities (90) script nw_rename.
Metabolic reconstruction. After binning, genomes were submitted individually to IMG (Integrated Microbial Genomes) (91) for annotation. Genome accession numbers are in Table 1, and all are publically available. Metabolic reconstruction found in Fig. S5 to S7 and in Table S1 and Fig. S11 to S13 at http://thethrashlab.com/publications came from these annotations and inspection with IMG's analysis tools, including Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway assignments and transporter predictions. Transporters highlighted for dissolved organic matter (DOM) uptake were identified based on information at the Transporter Classification Database (92). Carbohydrate-active enzymes (CAZymes) were predicted using the same routines as those in operation for the updates of the carbohydrate-active enzymes database (http://www.cazy.org) (93). RPKM abundance of taxa and genes. Abundance of taxa within the sample was quantified by evaluating mapped reads using RPKM (reads per kilobase per million) normalization (94) according to A ij ϭ (N ij /L i ) ϫ (1/T j ), where A ij is the abundance of bin i in sample j, N ij is the number of reads that map to bin from sample j, L i is the length of bin i in kilobases, and T j is the total number of reads in sample j divided by 10 6 . These values were generated for all bins, with only the data for the 20 uncultivated bins reported here. All contigs within a given bin were artificially concatenated into "supercontigs" prior to mapping. N ij was generated using the samtools (80) idxstats function after mapping with BWA. The data in Fig. S7 were created by summing (N ij /L i ) for groups of taxa defined in Table S1 at http://thethrashlab .com/publications prior to multiplying by (1/T j ). RNA coverage was used to evaluate both bin and gene activity for all bins. Mean coverage for each supercontig was calculated using bedtools (95), and bins were assigned a rank from lowest mean recruitment to highest mean recruitment. Bins with particularly high or low activity (transcript abundance) relative to their abundance (genome abundance) were identified using rank residuals, which were calculated as follows. On a plot of DNA coverage rank versus RNA coverage rank, residuals for each bin or gene were calculated from the identity. As the rank residuals followed a Gaussian distribution, bins with a residual that was Ͼ1 standard deviation (SD) from the rank residual mean were classified as having higher-than-expected transcriptional activity; bins with a residual that was Ͻ1 SD from the mean were classified as having lower-than-expected transcriptional activity. RPKM values were also calculated for every gene in every bin analogously to that for bins, using RNA mapping values extracted with the bedtools multicov function. Sample E2 was omitted from genespecific calculations as only 4,588 transcriptomic reads mapped successfully from this sample, compared to Ͼ100,000 from other samples. Of 140,347 genes, 17,827 had no evidence of expression in any sample and so were removed from further analysis. A total of 3,840 genes recruited reads in all remaining samples. All calculations are available in Table S1 at http://thethrashlab.com/publications or the R markdown document Per.gene.RPKM.Rmd in supplemental information. Table S1 at http://thethrashlab .com/publications includes only analyzed data for the uncultivated bins reported in this study. Note that RPKM values indicate abundance measurements across a small number of samples. While we can evaluate the relative expression of genes for those samples, our data set lacks sufficient power to evaluate estimates of significance in differential expression.
nrfA sequence assessment. Initial annotation of our bins identified putative homologs to the nrfAH genes associated with dissimilatory nitrite reduction to ammonia. Since nrfA-type nitrite reductases can be misannotated due to homology with other nitrite reductases, annotation for these genes was curated with phylogenetic analysis using known nrfA genes (38) obtained via A. Welsh (personal communication). Alignment, culling, and inference were completed with the FT_pipe script. The tree was rooted on the designated outgroup octaheme nitrite reductase sequence from Thioalkalivibrio nitratireducens ONR. Node labels were constructed with the newick utilities (90) script nw_rename. Visualization of the alignment ( Fig. S8B and C) to confirm the presence of the first CXXCK/CXXCH and highly conserved KXQH/KXRH catalytic site was completed with the MSAViewer (96) online using the unculled nrfA alignment as input.
SFam homology searches. To identify group-specific expansions in particular gene families, we performed a homology search of all predicted protein-coding sequences in each bin against the Sifted Families (SFam) database (41) using hmmsearch (HMMER 3.1b [97]) with default settings except for the utilization of 16 CPUs per search.
Accession numbers. Sequence read data are available at the NCBI SRA repository under the BioSample accession numbers SAMN05791315 to SAMN05791320 (DNA) and SAMN05791321 to SAMN05791326 (RNA).
Additional supplemental information. Table S1, scripts, workflows, and key files, including fasta files for each tree, are available at the Thrash Lab website: http://thethrashlab.com/publications.