Metagenomes Reveal Global Distribution of Bacterial Steroid Catabolism in Natural, Engineered, and Host Environments

ABSTRACT Steroids are abundant growth substrates for bacteria in natural, engineered, and host-associated environments. This study analyzed the distribution of the aerobic 9,10-seco steroid degradation pathway in 346 publically available metagenomes from diverse environments. Our results show that steroid-degrading bacteria are globally distributed and prevalent in particular environments, such as wastewater treatment plants, soil, plant rhizospheres, and the marine environment, including marine sponges. Genomic signature-based sequence binning recovered 45 metagenome-assembled genomes containing a majority of 9,10-seco pathway genes. Only Actinobacteria and Proteobacteria were identified as steroid degraders, but we identified several alpha- and gammaproteobacterial lineages not previously known to degrade steroids. Actino- and proteobacterial steroid degraders coexisted in wastewater, while soil and rhizosphere samples contained mostly actinobacterial ones. Actinobacterial steroid degraders were found in deep ocean samples, while mostly alpha- and gammaproteobacterial ones were found in other marine samples, including sponges. Isolation of steroid-degrading bacteria from sponges confirmed their presence. Phylogenetic analysis of key steroid degradation proteins suggested their biochemical novelty in genomes from sponges and other environments. This study shows that the ecological significance as well as taxonomic and biochemical diversity of bacterial steroid degradation has so far been largely underestimated, especially in the marine environment.

uents in all animal, plant, and fungal cells (1) and are thus likely the most abundant steroids in the environment. The earliest eukaryotic protosterol biosynthesis genes evolved around 2.3 billion years ago, suggesting that protosterol synthesis was an original trait in the earliest eukaryotic life-forms (2). More complex sterol biomarker molecules found in 650-to 540-million-year-old rocks have been attributed to marine sponges (3), indicating that the synthesis of complex sterols is among the oldest biosynthetic pathways in metazoans. Modern animals synthesize a variety of additional steroids, such as estrogenic and androgenic hormones and bile salts, the latter functioning as both dietary emulsifiers and hormonal and semiochemical signaling compounds in vertebrates (4,5).
These natural steroids are eventually released into the environment, and increasing industrial steroid production and use release additional steroids into the biosphere. Consequently, natural and synthetic steroids have been detected in marine, freshwater, and soil environments (6)(7)(8)(9) and in high concentrations in wastewater and feedlot runoff (10,11). Diverse steroids have been detected in many marine animals, including sponges (12) and corals (13). Concerns about adverse effects of anthropogenic steroids on organisms, including humans, have been raised (14), and research has shown endocrine-disrupting properties for selected steroids, even at very low concentrations (8).
Several aerobic steroid-degrading Actinobacteria and Proteobacteria have been isolated from soil (15)(16)(17), freshwater (18), and marine (19) environments, suggesting that steroids can be degraded by bacteria as growth substrates in these environments. This indicates that bacterial steroid degradation is an important process for recycling steroids in the global carbon cycle and for reducing potential adverse effects of environmental steroids. Some intracellular pathogenic bacteria such as Mycobacterium tuberculosis and Rhodococcus equi access cholesterol as a growth substrate directly from their host, and this trait is required for pathogenicity and persistence of these bacteria in the host (20,21), suggesting an additional function of bacterial steroid degradation in selected host-microbe relations.
Based on the homology of the 9,10-seco pathway across the aforementioned bacterial lineages, we recently conducted a genome-mining analysis of steroid degradation genes in prokaryotic and fungal genomes from the RefSeq database using hidden Markov models (HMMs) and reciprocal BLAST analysis (34). We identified 265 putative steroid degraders mainly from soil, eukaryotic hosts, and marine and freshwater environments, which were limited to the Actinobacteria and Proteobacteria and included 17 genera not previously known to include steroid degraders. Furthermore, our data suggested that only Actinobacteria degrade sterols, while Proteobacteria degrade bile salts and other less complex steroids. Positive growth experiments with nine predicted steroid degraders confirmed that our HMMs are suitable to identify bacterial steroid degradation enzymes and that this genome-mining approach is an effective way to identify steroid degraders.
However, knowledge about microbial steroid-degrading communities and ste-roid degradation processes in the environment remains limited. To address this, we mined with HMMs a set of 346 globally distributed, publically available, preassembled shotgun metagenomes from diverse environments. We aimed to identify ecological niches of steroid-degrading bacteria, hypothesizing that bacterial steroid degradation is a key biochemical process in habitats such as wastewater treatment plants (WWTPs), soil, the marine environment, and eukaryotic hosts. We further predicted that Actinobacteria are the dominant steroid degraders in habitats primarily containing sterols, while Proteobacteria are dominant in habitats containing less complex steroids. Metagenomes with a high potential for steroid degradation were subjected to genome binning to identify potential steroid-degrading organisms, which we hypothesized might include novel uncultured steroid degraders not represented by genome sequences. These results were used to infer information about the evolutionary origin of bacterial steroid degradation and its ecological relevance. Isolation and characterization of steroid-degrading bacteria from marine sponges validated our approach and predictions.

Selection of metagenomes and distribution of steroid degradation genes in environments.
Metagenome sources were classified following the metagenome classification system (35). In a prescreening, statistics of 596 assembled metagenomes were analyzed using MetaQUAST (36), and 346 metagenomes (see Table S1 in the supplemental material) with N 50 values higher than 300 bp, containing contigs longer than 600 bp, were selected for further analysis. These metagenomes were screened for steroid-degradation genes using 23 hidden Markov models (HMMs) representing 10 steroid degradation protein families. To focus on samples with high steroid degradation potential, we selected 107 metagenomes with HMM hits for all 10 protein families (see Fig. S1 and Table S1A in the supplemental material). These included 60 environmental metagenomes from freshwater, oceans, non-marine saline lakes, thermal springs, and soil, 17 host-associated metagenomes from marine sponges, rhizospheres, insects, and an ant fungal garden, and 30 engineered environment metagenomes from wastewater treatment plants (WWTPs), compost, and hydrocarbon-contaminated sites. The number of genome equivalents within metagenomes was calculated using MicrobeCensus (37). Two soil metagenomes without genome equivalents were not analyzed further. To estimate relative abundances of steroid degradation proteins in metagenomes, HMM hit numbers were normalized by dividing them by the number of genome equivalents within each sample. HMM hit numbers ranged from 0.46 to 18.8 hits per genome equivalent (Fig. 2). The highest normalized hit numbers were found in metagenomes from sponges, rhizosphere, deep ocean, WWTPs, and soil. The 105 analyzed metagenomes accounted for 18,695 HMM hits (see Table S2 in the supplemental material).
Taxonomy of steroid degradation proteins. Taxonomy of HMM hits in the 105 selected metagenomes was determined by a lowest common ancestor (LCA) approach using the RefSeq non-redundant protein database. Most hits affiliated with the Proteobacteria (64%) and Actinobacteria (23% [ Fig. 3; see Table S2 in the supplemental material]). Smaller proportions were affiliated with Firmicutes, Bacteroidetes, Chloroflexi, and other phyla, but none of these phyla were found to have genes for all 10 proteins (Fig. S2). Notably, the hsaC gene, encoding a key step in the pathway, was not found in any of the latter phyla. Interactive KRONA charts for the taxonomic assignment of steroid degradation HMM hits for all 105 metagenomes are available online (https:// github.com/MohnLab/Steroid_Degradation_Metagenomes_KRONA_charts_2017).
Binning of metagenome-assembled genomes. Tetranucleotide frequency-based genome binning of metagenomes with high steroid degradation potential using MyCC (38) produced 1,332 bins with genome contamination below 10% and completeness of more than 25%. Forty-nine of these metagenome-assembled genomes (MAGs) from 33 metagenomes were predicted to encode steroid degradation based on HMM hits for at least 5 out of the 10 steroid degradation protein families, including at least one KshA or HsaC hit (Table 1). Taxonomic classification using CAT (39) and a custom python script classified 20 of these MAGs as Actinobacteria, 9 as Proteobacteria, 11 as Alphaproteobacteria, 2 as Betaproteobacteria, and 3 as Gammaproteobacteria. Four MAGs were not classified beyond Bacteria. Sequence files for all 49 MAGs are available online (https://github.com/MohnLab/Steroid_Degradation _Metagenomes_MAGs_2017). MAGs were subsequently searched by best reciprocal BLASTp analysis against the steroid degraders Mycobacterium tuberculosis H37Rv, Rhodococcus jostii RHA1, Comamonas testosteroni CNB-2, Pseudomonas stutzeri Chol1, and Pseudoalteromonas haloplanktis TAC125 to more comprehensively identify steroid catabolism genes and match them to their reference orthologs.
Steroid degradation potential in engineered environments. (i) Wastewater treatment plants. The majority of predicted steroid degradation proteins from wastewater treatment plant (WWTP) metagenomes were assigned to the Alphaproteobacteria and Betaproteobacteria (Fig. 3). Dominant taxa within the Betaproteobacteria were Burkholderiales and Thauera (Rhodocyclaceae) (see Fig. S3 in the supplemental material and KRONA charts [https://github.com/MohnLab/Steroid_Degradation_Metagenomes _KRONA_charts_2017]), and we identified two MAGs associated with the Rhodocycla-ceae and Betaproteobacteria ( Table 1) encoding orthologs of Comamonas steroid degradation proteins (see Fig. S4A in the supplemental material). Alphaproteobacterial HMM hits were mostly assigned to the Sphingomonadaceae, and one Sphingomonadaceae MAG encoded orthologs of Pseudoalteromonas steroid degradation proteins (Fig. S4A). In addition, some wastewater metagenomes had HMM hits associated with the Corynebacteriales (Actinobacteria), including the genera Gordonia and Nocardioides. One Nocardioides MAG encoded orthologs to Mycobacterium steroid degradation proteins (Fig. S4B).
Most HMM hits from four metagenomes from hydraulic fracking wastewater inoculated with a microbial mat grown on grass-silage were assigned to the Gammaproteobacteria, mainly to the genus Marinobacterium (Oceanospirillaceae) ( Fig. S3 Table S1A for details). Bars are color coded by global environment.
Global Metagenomics of Steroid Catabolism ® wastewater metagenome was dominated by HMM hits assigned to the genus Pseudomonas.
(iii) Compost. The taxonomic diversity of predicted steroid degradation proteins from compost metagenomes varied widely among and within samples (Fig. 3). Actinobacteria HMM hits were most similar to proteins from Mycobacterium and Thermomonospora ( Fig. S3

FIG 3
Taxonomic classification of predicted steroid degradation proteins. Shown are class-, phylum-, or domainlevel assignments of predicted steroid degradation proteins in 105 analyzed metagenomes. "Other bacterial phyla" includes all phyla assigned to less than 1% of the proteins. Metagenomes are labeled using a three-letter code representing the global environment and a unique metagenome number (see Table S1A for details). Global Metagenomics of Steroid Catabolism ® most similar to proteins from the same taxonomic groups, as described above, mainly Sphingomonadaceae, Burkholderiales, Comamonadaceae, and Pseudomonas. Steroid degradation potential in natural environments. (i) Soil. Predicted steroid degradation proteins from soil metagenomes were largely associated with the Actinobacteria and Alphaproteobacteria (Fig. 3). While most steroid degradation HMM hits from Antarctic Dry Valley soil metagenomes were assigned to the genus Rhodococcus, actinobacterial HMM hits in other soil metagenomes were predominantly assigned to the genus Mycobacterium ( Fig. S3; KRONA charts [https://github.com/MohnLab/Steroid _Degradation_Metagenomes_KRONA_charts_2017]). From each of the Antarctic Dry Valley samples, we recovered Rhodococcus MAGs, which encoded orthologs for almost all Rhodococcus cholesterol and cholate degradation proteins (Fig. S4B). One Mycobacterium MAG from temperate forest soil encoded orthologs to Mycobacterium steroid degradation proteins. Several HMM hits within soil samples were assigned to the Rhizobiales (Alphaproteobacteria). One alphaproteobacterial MAG from peat soil encoded orthologs of Pseudoalteromonas steroid degradation proteins (Fig. S4A). Several soil HMM hits were assigned to the Burkholderiales (Betaproteobacteria).
(ii) Marine environments. The overall taxonomic affiliation of HMM hits in marine water column metagenomes differed largely between deep ocean and other samples (Fig. 3). The vast majority of HMM hits from deep ocean samples were assigned to the Actinobacteria, mainly to Mycobacterium, Rhodococcus, and Nocardioides ( Fig. S3; KRONA charts [https://github.com/MohnLab/Steroid_Degradation _Metagenomes_KRONA_charts_2017]). Seven Rhodococcus MAGs and one Nocardioidaceae MAG encoded orthologs of Rhodococcus cholesterol degradation proteins, but not of cholate degradation proteins (Fig. S4B). Only two deep ocean metagenomes had HMM hits predominantly assigned to the Gammaproteobacteria, namely, Spongiibacter and Alteromonadales. All other marine metagenomes had HMM hits predominantly assigned to the Proteobacteria. Open ocean and oxygen minimum zone (OMZ) metagenomes contained mostly HMM hits associated with Rhodobacterales, Sphingomonadales (both Alphaproteobacteria) and Cellvibrionales (Gammaproteobacteria) (Fig. S3; KRONA charts). One Rhodobacteraceae MAG from marine oil seep and two proteobacterial MAGs from two OMZ samples encoded orthologs to Pseudoalteromonas steroid degradation proteins (Fig. S4A). HMM hits from hydrothermal vent plume samples were predominantly assigned to the Rhizobiales (Alphaproteobacteria) and Alteromonadaceae (Gammaproteobacteria) (Fig. S3; KRONA charts).
Steroid degradation potential in host-associated communities. (i) Marine sponges. Steroid degradation HMM hits in metagenomes from marine sponges were mainly assigned to the Alphaproteobacteria, Gammaproteobacteria, and Actinobacteria (Fig. 3). Taxonomic affiliations of HMM hits in metagenomes from the sponges Aplysina aerophoba, Petrosia ficiformis, and Sarcotragus foetidus were similar to each other and dominated by hits associated with the Rhizobiales (Alphaproteobacteria) and the Actinobacteria ( Fig. S3; KRONA charts [https://github.com/MohnLab/Steroid_Degradation _Metagenomes_KRONA_charts_2017]). MAGs from these sponges (Table 1) were only classified to the domain or phylum level. Seven proteobacterial and three bacterial MAGs encoded orthologs of Pseudoalteromonas steroid degradation proteins (Fig. S4). Two actinobacterial MAGs encoded orthologs of actinobacterial steroid degradation proteins. HMM hits within an accompanying seawater sample (MSP_03) were predominantly assigned to the Alphaproteobacteria and Gammaproteobacteria (Fig. 3; Fig. S3; KRONA charts). HMM hits from Cymbastela metagenomes were dominated by either Hellea (Alphaproteobacteria) or by Cellvibrionales (Gammaproteobacteria).
(ii) Rhizosphere. Similar to the aforementioned soil metagenomes, most HMM hits from rhizosphere metagenomes were assigned to the Alphaproteobacteria and Actinobacteria (Fig. 3). Within the Alphaproteobacteria, assignments to the Sphingomonadaceae and Rhizobiales dominate ( Fig. S3 (Fig. S4A). One Mycobacterium MAG encoded orthologs of Mycobacterium cholesterol degradation proteins. One metagenome from switchgrass rhizosphere contained HMM hits almost exclusively assigned to the genus Pseudomonas (Gammaproteobacteria).
Phylogeny and novelty of KshA and HsaC proteins. The phylogeny of predicted  In addition, we analyzed the similarity of KshA and HsaC homologs from MAGs to proteins in the non-redundant RefSeq protein database. Interestingly, the source environment has a strong influence on the similarity of KshA and HsaC sequences to their homologs in RefSeq. Similarity values were lowest (below 60.4%) for homologs from sponges, marine OMZs, the dead zone of a freshwater lake, and rhizospheres (Fig. 4C). In contrast, similarity values for most homologs from all other environments were higher than 80%. In addition, while most homologs with high similarity values were classified as Actinobacteria, most hits classified as Proteobacteria, Bacteroidetes, or Bacteria had lower similarity values. This indicates that predicted KshA and HsaC proteins from sponges and a few other environments are phylogenetically distant from characterized proteins from well-known steroid-degrading Actinobacteria and Proteobacteria and are not well represented in protein databases, indicating that these proteins might have novel biochemical characteristics and substrate specificity.
Isolation of steroid-degrading bacteria from marine sponges. We attempted to isolate bacteria from six sponge species, not represented in our metagenome data set, using cholesterol as the substrate. Growth and substrate removal occurred in serial liquid enrichment cultures from five sponges. After 10 transfers of liquid cultures, colonies were obtained on cholesterol agar plates. Twenty-four colonies were further purified on either cholesterol or marine broth agar plates. Six isolates were able to grow with cholesterol in liquid culture (see Fig. S5 in the supplemental material). Three cholesterol degraders were classified by 16S rRNA gene sequencing as Cellvibrionales of the BD1-7 clade, one as a member of the Halieaceae family, one as an Alteromonadales Colwellia species, and one as a Mycobacterium species (Table 2). Phylogenetic analysis showed that these isolates are not among sponge-enriched 16S rRNA gene clusters, which represent bacteria found in sponges but rarely in other environments (40) (results not shown).

DISCUSSION
By mining a diverse and extensive metagenome data set for the presence of bacterial steroid degradation genes, we revealed that steroid-degrading bacteria are globally distributed and prevalent in wastewater treatment plants (WWTPs) and soil and plant rhizospheres. Our data further suggest that marine environments, particularly sponges, are favorable for steroid-degrading bacteria. Thus, the ecologic significance as well as taxonomic and biochemical diversity of bacterial steroid degradation has so far been largely underestimated.
Taxonomy and novelty of predicted steroid-degrading bacteria. Based on a comprehensive RefSeq genome analysis, we recently reported that steroid-degrading bacteria using the 9,10-seco pathway are restricted to the Actinobacteria and Proteobacteria (34). This conclusion is supported by the results of the present study, since the vast majority of predicted steroid degradation proteins encoded in metagenomes and  most MAGs predicted to encode steroid degradation were assigned to these two phyla. Further, within the overall data set, the complete set of all 10 pathway genes was never assigned to any other phylum (Fig. S2). However, this metagenomic analysis provides the first evidence for steroid degradation capacity within the alphaproteobacterial lineages Hyphomonadaceae, Rhizobiales, and Rhodobacteraceae, as well as the gammaproteobacterial lineages Spongiibacteraceae and Halieaceae. Isolation and characterization of steroid-degrading bacteria from sponges confirmed our prediction that members of the Spongiibacteraceae and Halieaceae catabolize steroids. Low 16S rRNA gene identities of some of these isolates to sequences in the SILVA 16S database suggest that they belong to taxonomic groups that have not yet been well studied with regard to their biochemical potential, likely representing novel species or genera within these families. Consistent with this, most steroid-degrading proteins and MAGs from Mediterranean sponges were taxonomically classified to only the domain or phylum level. Many of the KshA and HsaC proteins encoded in proteobacterial MAGs from sponge metagenomes and a few other environments are divergent from homologs in the RefSeq database and from characterized homologs, forming distinct phylogenetic clusters and suggesting biochemical novelty in the corresponding steroid degradation pathways. In accordance with this, alternative steroid degradation routes have been suggested for Sphingomonadales (18,41). In addition, the discrepancy between the taxonomic affiliation of proteobacterial MAGs versus phylogenetic association of their respective KhsA and HsaC homologs strongly suggests that those genes were transferred horizontally.
Ecology of steroid-degrading bacteria. (i) Marine sponges. Marine sponges are sessile filter feeders, which often host dense and diverse microbial communities (42). Sponges nonselectively filter microbes from seawater and digest most of them, but some microbes have evolved mechanisms to avoid phagocytosis and potentially establish a symbiosis (43). Several of our results suggest that particular steroiddegrading bacteria are enriched in some sponges compared to other marine environments. First, steroid degradation genes have a higher relative abundance in sponge versus other marine metagenomes, including a seawater metagenome collected from the vicinity of two Mediterranean sponges (44). Second, many putative steroid degradation MAGs were obtained from sponges, but only one from other pelagic marine environments. Third, closely related steroid-degrading bacteria were readily isolated from several unrelated sponges, indicating that particular steroid-degrading bacteria are present in phylogenetically and geographically diverse sponges. All of these observations are consistent with a symbiosis between sponges and steroid degraders. The dominant steroid degraders from sponges belong to the orders Sphingomonadales, Rhizobiales, and Rhodobacterales (Alphaproteobacteria) and Cellvibrionales (Gammaproteobacteria). Accordingly, we isolated several steroid-degrading Cellvibrionales from five unrelated sponges. We note that the isolated steroid degraders did not belong to taxa previously reported to be specifically associated with sponges (40), and steroid degradation proteins assigned to the same Alphaproteobacteria and Gammaproteobacteria orders were found in non-sponge marine metagenomes. Thus, steroid degraders in these orders appear to be widespread in the marine environment but have a greater relative abundance in sponges. Generally, sponge-microbe symbioses are thought to be predominantly mutualistic, where the microbes benefit from a constant supply of nutrients, while the sponge benefits from supplemental nutrients and microbial waste removal (45). Sponges acquire steroids by de novo biosynthesis and dietary intake, and, like other animals, are presumably not able to remove excess steroids through their metabolism. Therefore, a feasible scenario is a mutualism in which sponges remove excess steroids by excretion into the mesohyl where steroid-degrading bacteria use them as nutrients. Further investigation is clearly required to confirm such a symbiosis and elucidate its basis.
Sponges produce a remarkable variety of sterols, with more than 250 different structures identified (12). Sponge sterol pools are largely influenced by sponge phy-logeny, geographic location, environmental conditions, microbial communities, and diet. The divergence of steroid-degrading proteins encoded in sponge metagenomes may reflect their early evolutionary origin. It is even possible that bacterial steroid degradation originated in the sponge microbiome, as sponges are thought to be the earliest-branching metazoans (46), and the first sponge progenitors produced steroidlike compounds (3). Additionally, the highly variable structures of sponge steroids may underlie the divergence of associated steroid degradation proteins. Supporting this notion, Gammaproteobacteria that we isolated from sponges grow with cholesterol, in contrast to previous reports, which suggested that Proteobacteria are unable to degrade sterols (15,34). Genomic and metabolic analysis of our sterol-degrading isolates will likely provide further insight into the diversity and evolution of steroid degradation pathways.
The capacity of bacteria to degrade steroids may contribute to intracellular survival in sponges. Sponge oocytes developing from amoebocytes store lipids and digested bacteria in large vesicles (47), comprising a rich substrate source for bacteria in this environment. Similarly, Mycobacterium tuberculosis, the causative agent of tuberculosis, utilizes host cholesterol during infection and persistence in macrophages (20), which become loaded with cholesterol-containing lipid droplets. Disruption of the cholesterol degradation pathway in M. tuberculosis decreases its infectivity and persistence. We isolated a steroid-degrading Mycobacterium strain from a sponge, which might provide insight into the origin and evolution of cholesterol degradation as a mechanism of pathogenesis.
Interestingly, we did not find considerable numbers of steroid degradation proteins in metagenomes from other marine filter feeders like tunicates or corals. None of eight metagenomes from two tunicate species dominated by either Cyanobacteria or Proteobacteria (48,49) had any HMM hits for steroid degradation proteins. Only one metagenome from the coral Orbicella had a low frequency of steroid degradation HMM hits.
(ii) Free-living marine steroid degraders. Analysis of steroid degradation genes and steroid degrader MAGs in marine metagenomes revealed a distinct taxonomic division of steroid degraders between deep ocean environments versus other marine environments. In the deep ocean, the predominant steroid degraders appear to be Corynebacteriales, particularly Mycobacterium, Rhodococcus, and Nocardioides. Organic matter in deep oceans contains significant amounts of sterol-and hopanoid-like structures (50), constituting a potential growth substrate for these Corynebacteriales. Interestingly, KshA and HsaC sequences from Corynebacteriales MAGs from different sites in the Atlantic and Pacific deep oceans have high sequence similarities to each other, comprising distinct clusters within the KshA and HsaC phylogenies. This suggests a distinct steroid degradation pathway in Corynebacteriales in the deep oceans. Interestingly, none of the seven Rhodococcus MAGs from the deep ocean encoded homologues of the cholate degradation gene cluster from RHA1, which we recently proposed to be part of the core genome of the genus Rhodococcus (34). This suggests that Rhodococcus spp. in the deep ocean are deeply divergent from those in terrestrial and freshwater environments, with key differences in catabolic capacities. Recently, a steroid degradation pathway was proposed for the deep ocean Chloroflexi clade SAR202 (51), which entails an alternative ring degradation progression with several steps similar to the 9,10-seco pathway, but experimental evidence for steroid degradation capability in this clade is still missing.
In pelagic zones, oxygen minimum zones, and hydrothermal vents, the predominant steroid degraders appear to be Alphaproteobacteria and Gammaproteobacteria. These mainly include taxa not previously known to contain steroid degraders, including Rhizobiales and Hyphomonadaceae, Rhodobacteraceae, Halieaceae, Spongiibacteraceae, and Alteromonadales. However, these also include the genera Sphingomonas, Novosphingobium, and Pseudoalteromonas previously shown to degrade steroids (18,52). Two MAGs from oxygen minimum zones were classified only to the phylum level Proteo-bacterium, indicating that the respective organisms belong to novel taxonomic lineages. Altogether, our results suggest that the marine environment contains diverse steroid degraders that are taxonomically and biochemically novel, with strong potential to yield new insights into bacterial steroid degradation.
(iii) Wastewater treatment. Biological removal of steroids is well known in wastewater treatment plants (53,54), but little is known about the bacteria involved. Our results indicate that members of the families Rhodocyclaceae, mainly Thauera, and Sphingomonadaceae and members of the genus Gordonia represent key steroid degraders in activated sludge of municipal and industrial WWTPs. Supporting the validity of our findings, steroid-degrading Sphingomonas (55), Novosphingobium (56), Comamonas (57), Pseudomonas (58), and Gordonia (59) strains were isolated from a variety of WWTP samples. In addition, Thauera as well as Comamonas and Pseudomonas were the major testosterone degraders in anaerobic and aerobic enrichment cultures from a municipal WWTP, respectively (32,60). Accordingly, we recovered MAGs of predicted steroid degraders from most of these taxonomic lineages. Based on the fact that many characterized steroid-degrading bacteria exhibit narrow steroid substrate ranges (15,34), it is likely that Actinobacteria and Proteobacteria degrade different classes of steroids occurring in wastewater, such as steroid hormones, bile acids, and sterols. Nevertheless, further research is required to establish steroid removal activities for these bacteria and their functional importance.
Steroid-degrading Rhodocyclaceae, such as Thauera, are known for their ability to degrade steroids under anaerobic conditions, and Thauera was shown to use the alternative 2,3-seco pathway under anaerobic conditions (32). Nevertheless, a representative genome of Thauera also encodes an HsaC homologue, suggesting that this organism can use both the 2,3-and 9,10-seco pathways for steroid degradation. This is in agreement with our finding of KshA and HsaC sequences in WWTP metagenomes and MAGs affiliated with this genus.
(iv) Soil and rhizosphere environments. Based on its abundance of plant material and microbial eukaryotes, soil is likely to contain large amounts of steroids, particularly sterols. Supporting this, several soil metagenomes had abundant HMM hits for all 10 steroid degradation protein families and yielded several predicted steroid degrader MAGs.
Corynebacteriales, predominantly Mycobacterium, appear to be the dominant steroid degraders in desert, grassland, and temperate and tropical forest soils, as well as in rhizospheres. Mycobacteria are generally abundant in many soil types (61), and we recently reported that all sequenced Mycobacterium genomes, except for that of M. leprae, encode steroid degradation pathways (34). Some soil Mycobacteria infect and persist in soil-dwelling protozoa and amoebae (62), and it has been shown that sterol degradation is one of the central virulence mechanisms of M. marinum infecting amoebae (63). Our results confirm that soil-dwelling Mycobacteria harbor the genetic potential for steroid degradation. Phylogenetic analysis showed that an HsaC protein encoded in a Mycobacterium MAG from a temperate forest soil was divergent from HsaC proteins from other characterized steroid-degrading Mycobacteria. Thus, further research into soil Mycobacteria could provide insight into the evolution of steroid degradation pathways in Mycobacteria as a crucial pathogenicity trait.
Rhodococcus spp., also Corynebacteriales, appear to be the dominant group of steroid degraders in Antarctic Dry Valley soils. This group has both cholesterol and cholate degradation pathways. It is possible that seal and penguin carcasses and excrement, which regularly occur in Antarctic Dry Valleys (64), are a source of steroid substrates in this otherwise oligotrophic environment. Actinobacteria represented around 20% of the microbial soil community under a seal carcass in one such valley (65).
Plants secrete sterols and sterol-like saponins to the rhizosphere as growth promoters and antifungal compounds (66). These exudates may be important substrates for some rhizosphere bacteria. Accordingly, we found evidence for substantial populations of steroid-degrading Mycobacterium and Sphingomonadales in the rhizosphere. Some plant exudates function in plant-plant and plant-microbe communication (67). It is not known if steroidal exudates have such a function or how steroid degraders might impact such communication. Further research is required to characterize steroid degradation and its ecological importance in the rhizosphere.
Other environments and limitations of the present study. For some environments, such as freshwater, saline lakes, thermal springs, the deep subsurface, an ant fungal garden, and the digestive tracts of insects, we identified bacterial steroid degradation potential in only a small fraction of samples. This indicates that steroid degradation is not a major process in these environments, but that they are reservoirs for steroid-degrading bacteria. Most of the respective HMM hits in those samples were assigned to taxa previously known to include steroid degraders.
Aerobic steroid degradation genes were largely absent from anaerobic environments, such as anaerobic bioreactors, kimchi, kefir, and the digestive systems of vertebrates (including humans) and insects. Genes encoding KshA or HsaC homologues occurred occasionally in these environments (Tables S1B and S2), but predictably, we did not find evidence for a complete 9,10-seco pathway in these environments. Due to limited knowledge of the 2,3-seco pathway, we could not include HMMs for its key enzymes in our study, which presumably precluded detection of anaerobic steroid degraders such as Sterolibacterium denitrificans (68), which do not encode homologues to steroid-degradation oxygenases from the 9,10-seco pathway (69). However, the genome of Sterolibacterium denitrificans encodes several homologues of aerobic steroid side-chain degradation proteins (70), suggesting partial horizontal gene transfer between the aerobic and anaerobic pathways. Further investigation is clearly required to identify the ecological importance of anaerobic steroid degradation pathways. Steroid modification is known to occur and be important in gut systems (71). However, there is no evidence for steroid ring catabolism in gut environments.
The present study aimed to identify ecological niches for steroid-degrading bacteria by analyzing a large set of assembled, publically available metagenomes from diverse environments. A caveat of this approach is that the methods for DNA extraction, sequencing, quality filtering, and metagenome assembly impact the results. Importantly, the absence of steroid degradation proteins from individual metagenomes does not unequivocally exclude the presence of steroid-degrading bacteria in the corresponding samples. Accordingly, some metagenomes from environments we identified to be niches for steroiddegrading bacteria, such as soil, sponges, and the deep ocean, did not have sufficient HMM hit numbers to pass our analysis filter. We were not able to determine if this was caused by insufficient sequencing depth, poor metagenome assembly, or actual absence of steroiddegrading bacteria. However, our results demonstrate that our untargeted, pathway-centric approach allowed the identification of bacterial steroid degradation potential in metagenomes and recovery of draft genomes of steroid degraders. Notably, the approach found steroid degradation pathways and taxa quite divergent from previously known ones and found ecologically interpretable distribution patterns of pathways.

MATERIALS AND METHODS
Materials and methods are referred to in the Results section. Detailed materials and methods are available in Text S1 in the supplemental material.