Skip to main content
  • ASM
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems
  • Log in
  • My alerts
  • My Cart

Main menu

  • Home
  • Articles
    • Latest Articles
    • Archive
    • Minireviews
  • Topics
    • Applied and Environmental Science
    • Clinical Science and Epidemiology
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Therapeutics and Prevention
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics Resources and Policies
  • About the Journal
    • About mBio
    • Editor in Chief
    • Board of Editors
    • AAM Fellows
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • ASM
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems

User menu

  • Log in
  • My alerts
  • My Cart

Search

  • Advanced search
mBio
publisher-logosite-logo

Advanced Search

  • Home
  • Articles
    • Latest Articles
    • Archive
    • Minireviews
  • Topics
    • Applied and Environmental Science
    • Clinical Science and Epidemiology
    • Ecological and Evolutionary Science
    • Host-Microbe Biology
    • Molecular Biology and Physiology
    • Therapeutics and Prevention
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics Resources and Policies
  • About the Journal
    • About mBio
    • Editor in Chief
    • Board of Editors
    • AAM Fellows
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
Research Article | Applied and Environmental Science

Unusual Metabolism and Hypervariation in the Genome of a Gracilibacterium (BD1-5) from an Oil-Degrading Community

Christian M. K. Sieber, Blair G. Paul, Cindy J. Castelle, Ping Hu, Susannah G. Tringe, David L. Valentine, Gary L. Andersen, Jillian F. Banfield
David A. Relman, Editor
Christian M. K. Sieber
Department of Earth and Planetary Science, University of California, Berkeley, California, USADepartment of Energy Joint Genome Institute, Walnut Creek, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christian M. K. Sieber
Blair G. Paul
Marine Science Institute, University of California, Santa Barbara, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cindy J. Castelle
Department of Earth and Planetary Science, University of California, Berkeley, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ping Hu
Ecology Department, Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USADepartment of Biology, St. Mary’s College of California, Moraga, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susannah G. Tringe
Department of Energy Joint Genome Institute, Walnut Creek, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David L. Valentine
Marine Science Institute, University of California, Santa Barbara, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gary L. Andersen
Ecology Department, Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USADepartment of Environmental Science, Policy and Management, University of California, Berkeley, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jillian F. Banfield
Department of Earth and Planetary Science, University of California, Berkeley, California, USADepartment of Environmental Science, Policy and Management, University of California, Berkeley, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David A. Relman
VA Palo Alto Health Care System
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1128/mBio.02128-19
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

ABSTRACT

The candidate phyla radiation (CPR) comprises a large monophyletic group of bacterial lineages known almost exclusively based on genomes obtained using cultivation-independent methods. Within the CPR, Gracilibacteria (BD1-5) are particularly poorly understood due to undersampling and the inherent fragmented nature of available genomes. Here, we report the first closed, curated genome of a gracilibacterium from an enrichment experiment inoculated from the Gulf of Mexico and designed to investigate hydrocarbon degradation. The gracilibacterium rose in abundance after the community switched to dominance by Colwellia. Notably, we predict that this gracilibacterium completely lacks glycolysis, the pentose phosphate and Entner-Doudoroff pathways. It appears to acquire pyruvate, acetyl coenzyme A (acetyl-CoA), and oxaloacetate via degradation of externally derived citrate, malate, and amino acids and may use compound interconversion and oxidoreductases to generate and recycle reductive power. The initial genome assembly was fragmented in an unusual gene that is hypervariable within a repeat region. Such extreme local variation is rare but characteristic of genes that confer traits under pressure to diversify within a population. Notably, the four major repeated 9-mer nucleotide sequences all generate a proline-threonine-aspartic acid (PTD) repeat. The genome of an abundant Colwellia psychrerythraea population has a large extracellular protein that also contains the repeated PTD motif. Although we do not know the host for the BD1-5 cell, the high relative abundance of the C. psychrerythraea population and the shared surface protein repeat may indicate an association between these bacteria.

IMPORTANCE CPR bacteria are generally predicted to be symbionts due to their extensive biosynthetic deficits. Although monophyletic, they are not monolithic in terms of their lifestyles. The organism described here appears to have evolved an unusual metabolic platform not reliant on glucose or pentose sugars. Its biology appears to be centered around bacterial host-derived compounds and/or cell detritus. Amino acids likely provide building blocks for nucleic acids, peptidoglycan, and protein synthesis. We resolved an unusual repeat region that would be invisible without genome curation. The nucleotide sequence is apparently under strong diversifying selection, but the amino acid sequence is under stabilizing selection. The amino acid repeat also occurs in a surface protein of a coexisting bacterium, suggesting colocation and possibly interdependence.

INTRODUCTION

Metagenomics data, the DNA sequences from microbial communities, can be used to reconstruct genomes from uncultivated organisms and provide insight into biological processes shaping their ecosystems. The approach has led to the discovery of numerous previously unknown phyla, many of them belonging to the candidate phyla radiation (CPR), which now appears to constitute a major part of the bacterial domain (1, 2). The candidate phylum BD1-5 was first genomically sampled from an acetate-amended aquifer (Rifle, CO) (3). The organisms were suggested to have limited metabolism and predicted to be symbionts (possibly episymbionts), but the nature of their associations with other organisms remains a mystery. Wrighton et al. (3) predicted that BD1-5 bacteria use an alternative genetic code in which the stop codon UGA encodes an amino acid. Following sampling by single-cell genomics, BD1-5 members were named Gracilibacteria (4). The prediction that UGA codes for glycine in Gracilibacteria was experimentally validated by Hanke et al. (5) through proteomic analysis of a sediment enrichment culture. However, the lack of very-high-quality genomes has limited detailed analysis of the lifestyle of Gracilibacteria and complicated predictions regarding the presence and absence of key metabolic pathways.

Here, we used metagenomic data from a previously performed experiment intended to simulate the Deepwater Horizon (DWH) oil spill (6) to reconstruct the first closed, circular genome (1.34 Mbp) for a Gracilibacteria population. The experiment was inoculated using a water sample collected from the Gulf of Mexico, and Gracilibacteria were detected at moderate abundance 64 days after oil droplet addition (see Materials and Methods). The genome encodes numerous proteins that could not be assigned a potential function, but genes and pathways that are present were easily recognizable. Notably, one hypervariable gene is inferred to encode a protein under strong stabilizing selection, thus likely important for survival. Even for a CPR bacterium, we note an unusual lack of core carbon compound metabolic pathways, including the complete absence of glycolysis and the pentose phosphate pathway. Glycolysis is the major pathway for sugar utilization and is present even in the very small genomes of Buchnera and “Candidatus Blochmannia,” bacteria that are obligate insect endosymbionts (7), and at least a partial pathway is present in many other symbionts. These observations raise interesting questions regarding how central carbon currencies are acquired and how reducing power is generated and recycled.

RESULTS AND DISCUSSION

Genome assembly and curation reveal a hypervariable gene.The draft Gracilibacteria (BD1-5) genome binned by Hu et al. (6) from sample BD02T64, taken 64 days after the start of the laboratory experiment (see Materials and Methods), was selected for further curation as it comprised just 6 scaffolds. We verified that these scaffolds cluster tightly together on a tetranucleotide emergent self-organizing map (see Fig. S1 in the supplemental material), supporting their derivation from a single genome (8). Protein predictions for all of these six scaffolds required use of an alternative code in which the UGA stop codon is translated as an amino acid. Consistent with prior studies of Gracilibacteria, the genes were predicted using genetic code number 25 (UGA translated as glycine [9]). There have been two main ideas proposed to explain how alternative coding arises. The first relates to the low GC content of some (but not all) of the genomes it occurs in. The currently described genome fits this pattern (28.87% G+C). Alternatively, McCutcheon and Moran (10) invoke loss of peptide chain release factor 2 (encoded by prfB), which recognizes UGA codons, to explain the reassignment of stop UGA to tryptophan (code no. 4) in insect symbionts. Consistent with the hypothesis of McCutcheon and Moran (10), prfB was not detected in the genome of the gracilibacterium studied here, or in any other available BD1-5 genome. However, peptide chain release factor 1 (prfA) was detected, and the gene coding for it is widely identified across the CPR.

FIG S1

Emergent self-organizing map showing clear clustering of genome segments (red dots) of the scaffolds assigned to the BD1-5 bin from sample BD02T64 based on tetranucleotide frequencies. Download FIG S1, TIF file, 0.5 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

Prior to read-based curation, the six scaffolds were tentatively condensed into two based on perfect overlaps at scaffold ends. Local assembly errors were removed by curation, and unplaced paired reads were used to close gaps. Reads mapped to the scaffolds were visualized in Geneious (11). Notably, a region where two scaffolds were joined based on end overlaps was identified as incorrectly assembled based on the absence of perfect read support. Inaccurate read placements were associated with a hypervariable repeat locus. By manual step-by-step repositioning of paired reads (first placing reads anchored into the repeat locus boundaries, then considering paired read distances and repeat composition), it was possible to generate a representation of the sequence through the locus (Fig. 1). Due to the large size of locus compared to the paired read distance, it is impossible to determine the exact number of repeats or if the locus exhibits cell-to-cell variation in repeat number per locus. However, based on the average sequencing depth, the approximated locus is probably of about the correct length and not highly variable in terms of the number of repeated sequences.

FIG 1
  • Open in new tab
  • Download powerpoint
FIG 1

Repeat locus from the BD1-5 genome. Colored arrows represent repeated sequence blocks, the sequences for which are shown in the “Repeats” insert. Sets of arrows represent reads, and reads linked within this region to paired reads are indicated by a thin connecting line.

We verified the final genome path by calculating the cumulative GC skew of the closed chromosome sequence and identified the pattern expected for normal bacterial bidirectional replication (Fig. 2). The final assembly comprises 1.34 Mb, 1,243 protein coding genes, 33 tRNA genes and one set of rRNAs (Table 1). According to an RAxML tree based on 16S rRNA genes, the closest relative to our organism was sampled from deep sea sediments; other closely related sequences are from marine environments (Fig. 3).

FIG 2
  • Open in new tab
  • Download powerpoint
FIG 2

Diagram showing the GC skew (gray dots) and calculated cumulative GC skew (green line) across the finished BD1-5 genome. The pattern is typical of a correctly assembled genome of a bacterium that undergoes bidirectional replication from origin to terminus.

View this table:
  • View inline
  • View popup
  • Download powerpoint
TABLE 1

General information about the Gracilibacteria genome

FIG 3
  • Open in new tab
  • Download powerpoint
FIG 3

Phylogenetic placement of the Gracilibacteria genome from sample BD02T64 reported here. The 16S rRNA tree was constructed using the maximum likelihood method RAxML. The small black circles indicate nodes with values of >70% bootstrap support. 16S rRNA genes retrieved from genomes are indicated by green circles. Dotted circles represent published draft genomes, and the full circle indicates the finished and curated genome from this study. Colored circles indicate the type of ecosystem from which sequences were obtained. The full tree file is provided in the Data Set S1.

DATA SET S1

Newick version of the 16S rRNA tree with full sequence identifiers. Download Data Set S1, TXT file, 0.1 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

The predicted amino acid sequence of the BD1-5 gene containing the repeat region is shown in Text S1, part A, in the supplemental material. Some repeat types occur in blocks, and some repeat types alternate, but overall the most striking feature of the locus is the high level of apparent cell-to-cell heterogeneity (Fig. 1). Variant calling in reads mapped to the full-length protein identified 17 synonymous single nucleotide substitutions and zero nonsynonymous substitutions, with the exception of instances occurring only on a single read. In fact, this gene contains the highest proportion of synonymous substitutions in any genes in the BD1-5 genome.

TEXT S1

(A) Sequence of the Gracilibacteria repeat protein. The sequence is approximate in terms of repeat number and read arrangement due to the limitation of the read length (see Fig. 2). Interestingly, “Candidatus Gracilibacteria bacterium” HOT-871 (RefSeq GCF_002761215.1) does not have this protein. (B) Curated 3,459-aa sequence of the large Colwellia surface protein that also contains the PDT repeat. The sequence is predicted to have two carbohydrate binding domains and a central right-handed beta helix region. The repeat region appears just after the second carbohydrate binding domain. Download Text S1, PDF file, 0.1 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

Within the repeat region, the incidence of synonymous versus nonsynonymous substitutions is shown in Fig. 1 (insert). The four main repeat nucleotide sequence variants are indicated in orange, green, yellow, and blue, along with their translated sequences. Single-incidence sequences are indicated by white bars. Notably, the nucleotide sequences of the four major repeat variations all translate to the tripeptide amino acid motif PTD. Given that the Gracilibacteria population cells share near-identical nucleotide sequences genome-wide, except within this specific locus, we infer that the repeat-bearing protein may be under strong pressure to evolve at the nucleotide level. If cells acquire nonsynonymous substitutions in the repeat protein, they are apparently strongly selected against.

The PTD repeat motif is found in hypothetical proteins and predicted surface proteins of a few other organisms, including some that are eukaryotic sporozoite surface protein 2-like. A secondary structure prediction of the BD1-5 protein suggests only β sheet and coils, with the repeat motif in a coil region. We predict a single N-terminal transmembrane domain and extracellular localization of the remainder of the protein sequence, including the repeat region (see Fig. S2 in the supplemental material).

FIG S2

Tertiary structure prediction of BD1-5 repeat protein by I-TASSER showing variable repeat region in purple, transmembrane region in orange and noncytoplasmic region in turquoise. Download FIG S2, TIF file, 0.6 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

We investigated codon usage in the repeat gene. The codons for D are GAC and GAT, with usage of 6.2:6.7 in the repeat gene, whereas the expected genome-wide incidence is GAC:GAT = 0.83:4.79. Synonymous substitutions within the repeats could cause ribosome pausing and modulate rates of protein folding (12, 13). While we considered that atypical codon use in this region might indicate selection for translational pausing, the corresponding 5′ tRNA anticodon position (G) enables a wobble pair to recruit the same tRNA to either GAC or GAT. The codons for P are coded for by CCA, CCG, CCT, and CCC, with an expected incidence of 1.3:0.13:0.89:0.13, whereas the repeat gene has an incidence of 11.82:0.25:1.23:0. Thus, there is evidence for strong selection for the CCA codon (considered further below). Inosine is the only tRNA-proline 5′ anticodon base that could accommodate all synonymous variants. The more prevalent codons, CCA and CCT would recruit two different tRNAs. Intriguingly the tRNA-proline gene carried by the gracilibacterium genome corresponds to CCA or CCG. In the third position of the tripeptide, T can be ACT, ACC, ACA, or ACG with an expected incidence of 2.1:0.41:2.4:0.36. However, within the repeat gene, ACT, ACC, ACA, and ACG occur with an incidence of 4.19:0:6.9:6.4. Again, with the exception of anticodon inosine wobble pairing, reliance on rare codons may indicate selection for translation pausing.

If it is advantageous for coexisting cells to have highly variable rates of translation, one might expect that the sequences would make maximal use of the available codons. Counter to this, we see reduced codon diversity. Thus, we considered that variation in the secondary structure of the RNA sequence in the repeat array may be selected for. In the secondary structure prediction for the repeat region, we note the periodic alternation of stems, comprising mostly Watson-Crick base pairs, and loops (see Fig. S3 in the supplemental material). Notably, the CCA codon (specifically the first C) is at the base of the bubbles and closes them, paired to G’s from either the first base of the first codon or the last base of the third codon. Stem-loops impact RNA folding, can stabilize mRNA, and provide recognition sites for RNA binding proteins. We speculate that nucleotide variation may impact the translation rate of this gene and lead to variation in the fitness of different population members.

FIG S3

Secondary structure of repeat locus. (A) Colored arrows represent repeated sequence blocks, the sequences for which are shown in the repeats insert. Sets of arrows represent reads. Substitutions are indicated with arrows and numbers. (B) Secondary structure of repeat locus. (C and D) The consequence of substitutions 1 (C) and 2 (D) on the secondary structure are indicated in local regions. Download FIG S3, TIF file, 2.3 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

We searched the genomic region flanking this gene but did not identify a known mechanism for site-directed mutagenesis within the repeat locus. Genes with functions linked to DNA repair and recombination are found in close downstream proximity (uvrC excinuclease [5,798 bp downstream], an exodeoxyribonuclease III gene [8,336 bp downstream], and DNA recombination-mediator protein gene, dprA [30,798 bp downstream]). Perhaps this organism possesses a DNA mutator, which mediates targeted diversification in the repeat locus. It is unlikely that the organism is deficient in repair enzymes, as sequence variation is not elevated elsewhere in the genome. Perhaps nucleotide heterogeneity arose due to suppressed proofreading in this region, but we have no explanation for how this might have occurred.

In the current study, it is difficult to evaluate locus length variation because read lengths are short compared to the length of the repeat arrays. Locus length variation is expected, given the presence of perfect repeat arrays. In bacterial genomes, repeat regions may expand and contract due to either recombination or slipped-strand mispairing (SSM [14, 15]), resulting in population variability in terms of tripeptide motifs that may impact three-dimensional protein structure and ligand binding. The relationship between microsatellite length and point mutation has been described elsewhere and generally predicts that as a locus expands, base substitutions accumulate and suppress further SSM (16, 17). If SSM is undesirable, it is advantageous to include nucleotide variants that offset repeat pairing and thus prevent slippage.

Examination of the 5′- and 3′-untranslated regions flanking the repeat gene uncovered two sequences capable of forming stem-loops with notably long stems (14 to 15 bp) and 4- to 6-bp loops (see Fig. S4 in the supplemental material). As DNA or RNA structures, these stem-loops may play a role in recombination or as transcriptional regulation signals for the repeat-containing gene, respectively.

FIG S4

Position and structure of stem-loops located in the untranscribed regions (UTRs) of the BD1-5 repeat protein. The 3′ stem-loop is 5 bp from the gene end to 31 bp downstream (ATTAAAAAAAGAGATTCGTATATCTCTTTTTTTAAT [ΔG = −10.75 kcal/mol]). The 5′ stem-loop is 62 to 94 bp upstream (AAAAAACAACTCATTTTTATGAGTTGTTTTTT [ΔG = −12.49 kcal/mol]). A highly similar (88% identical) stem-loop sequence was identified elsewhere in the genome within the 5′ end of a putative type II/IV secretion system gene (AAAAAACACTGATAAAAATCAGTGTTTTTT; coordinates 167813 to 167842 [ΔG = −11.49 kcal/mol]). Download FIG S4, TIF file, 0.2 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

Some reads mapped to the BD1-5 repeat region had paired reads that were not placed in that genome. Comparison of the non-repeat regions of these reads and the sequences of their unplaced paired reads to the genomes of other community members revealed 100% nucleotide matches to a region on BD02T64_scaffold_179, part of a draft Colwellia psychrerythraea genome (BD02T64_Colwellia_psychrerythraea_38_180_partial). Thus, we concluded that a region within the genome of this abundant population has the same PTD repeat as found in the Gracilibacteria protein (Text S1, part B). After curation of the region, the C. psychrerythraea protein is predicted to be 3,459 amino acids in length, with a signal peptide and extracellular localization, possible galactose/carbohydrate-binding domains, pectin lyase/virulence domains and parallel β helix repeats. The repeat occurs within a structure that otherwise consists of a mixture of α helices and β sheets but is in neither of these. Besides the repeat region, the C. psychrerythraea protein does not share any sequence identity (<10%) with the Gracilibacteria protein. Also, the TGA codon, which is repurposed in Gracilibacteria, is not used in either of the repeat regions.

Within the Colwellia protein, reads carry up to 11 repeats (see Fig. S5 in the supplemental material). As for BD1-5, it is impossible to detect variation in repeat number in each cell due to the read length limitation, but one read has only five repeats. In virtually all cases, the nucleotide repeat is encoded by a single 9-mer (yellow), This 9-mer is prominent toward the end of the Gracilibacteria repeat region. The loci in both genomes terminate with the same 9-mer (orange in Fig. S5). The essentially perfect repeated sequence would make this region prone to replication slippage, leading to cell-to-cell variation in the number of tripeptides in the protein.

FIG S5

Small subset of the Colwellia genes predicted to include a PTD repeat. Repeats code for the tripeptide PDT. The black bar indicates a deletion in one variant. The sequence of yellow and orange 9-mers is provided in Fig. 1. Download FIG S5, TIF file, 1.8 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

Interestingly, several of the Gracilibacteria proteins encoded immediately adjacent to the variable PTD protein have the highest similarity to proteins in organisms that are not part of the CPR. One is most similar to a protein from Colwellia psychrerythraea, although the percentage of amino acid identity is low (∼53%). To rule out chimeric assembly of sequence from another bacterium in this genomic region, we confirmed the expected alternative coding throughout (and paired-read placements were verified during the main curation phase). Thus, the region encoding the Gracilibacteria variable repeat gene and adjacent genes may have been acquired from a bacterium related to Colwellia psychrerythraea.

Metabolic analysis.The biosynthetic pathways easily recognizable in the genome are for ribosome-based protein synthesis, nucleic acid synthesis and interconversion, DNA repair, peptidoglycan production, secretion, pilus production, and cell division. However, as for other members of the CPR, this gracilibacterium appears to lack the ability to synthesize lipids needed for construction of the cytoplasmic membrane (and, there is no pathway for synthesis of lipid A required for a Gram-negative cell envelope). Thus, these cells are predicted to be either symbionts or closely dependent on other community members for key building blocks. The genome lacks a CRISPR-Cas system for phage defense, but has a restriction modification system that may serve this purpose (18, 19). Absent are almost all pathways for amino acid synthesis, leading us to conclude that amino acids needed for protein biosynthesis are derived through breakdown of externally derived peptides. Many different types of peptidases and proteases are available for this process.

For nucleic acid synthesis the genome encodes the steps required to interconvert nucleotides. We also identified most of the genes required for biosynthesis of purines and pyrimidines from glutamine and aspartate; these genes are relatively uncommon in CPR. IMP can be converted to AMP, ADP, and ATP and incorporated into RNA and DNA. Enzymes were also identified to interconvert forms of GDP and GTP. Genes of the one carbon pool by the folate pathway were identified, enabling transfer of C1 groups during nucleotide metabolism, but genes for folate biosynthesis were not identified.

Perhaps the most surprising feature of this bacterium is the complete lack of genes for glycolysis and the pentose phosphate pathway, which makes this genome distinct from other Gracilibacteria, and possibly even from all other bacteria. At least partial pathways are present in other Gracilibacteria, and the first reported genomes have full pathways to convert glucose to pyruvate and fermentation-based metabolisms (3). More broadly, at least parts of these pathways are present in the most minimal CPR genomes. However, this is the first genome from a major subgroup within Gracilibacteria (Fig. 3), so it remains to be seen whether this is a common trait. The absence of these pathways raises two questions: (i) the nature of central carbon metabolism in these organisms and (ii) how ATP, NADH, NADPH, and ferredoxin are reduced and recycled.

Potentially addressing the first question, we identified a variety of pathways for production of central carbon currencies. We identified a putative two-subunit ATP citrate (pro-S)-lyase (EC:2.3.3.8) (genes 1051 and 1052), a complex rarely detected in CPR. This annotation (versus citrate synthase) was supported by HMM homology and the presence of the active site residues GHAGA (20). Via this complex, citrate can be converted to acetyl-CoA and oxaloacetate. Citrate may be obtained from external sources via two putative citrate transporters. Intriguingly, both ATP citrate (pro-S)-lyase subunits are most similar to predicted proteins in archaea, suggesting their acquisition via lateral gene transfer. We predict that oxaloacetate derived from breakdown of citrate is converted to pyruvate via a 2-oxoacid ferredoxin oxidoreductase (OFOR). Pyruvate can also be produced from phosphoenolpyruvate via a pyruvate kinase, and from malate (1.1.1.38) and serine (4.3.1.19). Overall, amino acids scavenged from the environment appear to feature prominently in the metabolism of this gracilibacterium, and some are converted into the nitrogen and carbon storage compound cyanophycin.

Addressing the second question, we identified many reactions that oxidize or reduce energy currencies via transformation of small carbon compounds. Specifically, pyruvate conversion to acetyl-CoA via OFOR consumes NADH while reducing ferredoxin. The ferredoxin may be reoxidized via either a cytoplasmic or membrane-bound ferredoxin reductase (FNR) that also converts NADP+ to NADPH. NADH may be regenerated in the production of pyruvate from serine or malate. Like citrate, malate may be obtained from external sources. Other reactions, such as those involved in peptidoglycan synthesis and interconversion of tetrahydrofolate compounds, also interconvert energy currencies. Enzymes that respond to oxidative stress response also may provide electron sinks.

Many CPR bacteria generate ATP via substrate-level phosphorylation reactions that produce compounds such as acetate, but genes for production of these short-chain fatty acids were not identified. ATP required for DNA and RNA biosynthesis may be formed via the F-ATP synthase complex (complex V). Given the lack of an electron transport chain that could pump protons, proton motive force (PMF) could be stolen from attached host cells if tight junctions are formed (19, 21). Such close physical associations have been reported for another CPR bacterial group, Saccharibacteria (TM7), which attach to host Actinobacteria cell surfaces (22). Alternatively, proton motive force could be generated by cytoplasmic drawdown of H+ via reactions involved in breakdown of amino acids and other compounds, Na+/H+ antiport, or consumption of H+ by superoxide dismutase. The ATP synthase may also be used reversibly to generate proton motive force (as suggested by Wrighton et al. [3]), but no complexes were identified that could make use of the generated PMF. Specifically, there is no indication of hydrogenases, which occur in some other CPR members. Lacking also are other electron transport chain components, such as NADH dehydrogenase, succinate dehydrogenase, and cytochrome c reductase/oxidase, and most steps of the tricarboxylic acid cycle (Fig. 4).

FIG 4
  • Open in new tab
  • Download powerpoint
FIG 4

Cell cartoon depicting a reconstruction of the metabolism of the gracilibacterium. Bold text indicates prominent functions, blue text indicates resources inferred to be externally derived. * indicates that reactions for biosynthesis of cofactors require a precursor compound. Abbreviations: PEP, phosphoenolpyruvate; UDP-GlcNAc, UDP-N-acetyl-α-d-glucosamine; OFOR, 2-oxoacid ferredoxin oxidoreductase; 3PG, 3-phospho-d-glycerate; 3-PoxyP, 3-phosphonooxypyruvate; 2-oxoglu, 2-oxoglutarate; FNR, ferredoxin reductase; PPPi, PPi, and Pi: phosphate compounds interconverted by inorganic pyrophosphatase; Mk-n, metaquinone; Mk-l, metaquinol; Succ. semiald., succinate semialdehyde; l-Glu, l-glutamate; R-COOH, a carboxylic acid; CPG, cyanophycin; FG, N-formyl-l-glutamate. PTD is a tripeptide repeat.

A variety of transporter types were predicted, presumably addressing the need to acquire compounds from other cells or detritus. There are many hypothetical membrane-associated proteins with multiple transmembrane (TM) domains that also may serve a transport role. Overall, 122 transmembrane proteins (>3 transmembrane domains) and 80 transporter proteins were identified (Table 1; see Table S1 in the supplemental material). The genome encodes an intriguing 990-aa protein predicted to contain 32 transmembrane domains (gene 860). A large-scale analysis of TM-rich proteins in the NCBI nr database revealed that very few have 32 or more TM domains, and only a few related proteins are known (mostly in other Gracilibacteria). The function of this enigmatic protein is uncertain as the only domain predicted is DUF2339 (hypothetical membrane protein).

TABLE S1

Prediction of transporter proteins, secreted proteins, and transmembrane proteins. Download Table S1, XLSX file, 0.1 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

A notable feature of the Gracilibacteria genome is the prominence of secretion mechanisms and secreted proteins. We identified 66 such proteins using a combination of three methods to predict signal peptide-mediated export, of which only 24 are shorter than 300 amino acids. A further 104 proteins are predicted to be secreted via nonclassical pathways that do not use a signal peptide. Of these, 43 are larger than 300 amino acids. In addition to a sortase (typically found in Gram-positive bacteria and common in CPR), we identified genes of the type II and IV secretion pathways that are generally associated with Gram-negative bacteria, including multiple copies of SecA, -D, -F, -Y, -E, and -G). SecYEG form the central translocase across the inner membrane, SecA guides proteins to the translocase channel and is the ATPase, and SecF promotes release of the mature peptide into the periplasm. Thus, the identified components provide the functions required for secretion in non-Gram-negative bacteria. Intriguingly, 15 general secretion protein G proteins (GspG, alternatively PulG) are predicted, as well as GspE. These are large proteins, on average 528 aa in length. GspG is the major pseudopilin present in a pseudopilus, and GspE is an ATPase involved the assembly of the pseudopili. In addition, we identified around 12 type IV pilus assembly protein subunits, some in multicopy. Type IV pili allow the transfer of genetic material representing PilV, -C, -B, and -W and are involved in twitching motility (the genome also has two pilT genes). PilD (leader peptidase) was also identified. We did not identify PilQ, consistent with lack of outer membrane. Pili may be involved in attachment and interorganism interactions, as well as uptake of DNA. Competence genes were also identified (19, 21, 23).

From the perspective of the cell envelope, the biosynthesis pathway for peptidoglycan is complete, although the requirement for precursor UDP-N-acetylglucosamine from external sources is predicted. Predicted are genes to convert phosphorylated isoprenoid into a precursor for peptidoglycan, but the genome lacks the archaeal mevalonate and bacterial MEP (2-C-methyl-d-erythritol 4-phosphate) pathways. It has geranylgeranyl diphosphate synthase, but the reason is unclear. In addition, we identified three genes that degrade l-lysine and d-glutamate that may feed intermediates into two different steps within the peptidoglycan biosynthesis pathway. The genome contains many genes for polysaccharide synthesis (e.g., no. 444-460) and for proteins with S-layer domains. Thus, we anticipate a cell-wall-containing peptidoglycan with a periodic surface layer, many and potentially diverse pili, and a variety of large extracellular proteins and polymeric substances (Fig. 4). Interestingly, some S-layer proteins may have toxin domains (e.g., 1226, predicted to have polycystin-1, lipoxygenase, and alpha-toxin domains). Other large proteins have annotations suggestive of hostile interactions with other organisms (e.g., insecticidal toxin complex protein [TccC]), and there is a predicted invasin domain in one large protein in the genome.

In terms of the ability to respond to environmental conditions, the genome encodes at least four RelA/SpoT domain proteins, three of them encoded sequentially and one larger multidomain protein encoded elsewhere. These may function in response to nutrient limitation. Also identified are two 8-oxo-dGTP diphosphatase genes to prevent misincorporation of the oxidized purine nucleoside triphosphates into DNA and proteins with antioxidant functions, including superoxide reductase and enzymes to reduce oxidized methionine.

We conclude that the inferred putative symbiotic lifestyle of Gracilibacteria differs in notable ways from those of other obligate host-associated organisms. The genome size is large, compared to those of most obligate host-associated organisms (usually <1 Mbp in length [24]). Host-associated bacteria that have experienced moderate genome reduction retain genes for synthesis of fatty acids and peptidoglycan (but not for lipopolysaccharide [LPS] or phospholipids), whereas those that have undergone extreme genome reduction have essentially no genes for cell envelope biosynthesis (10). In contrast, the gracilibacterium seems to rely entirely on externally derived fatty acids. It retains genes for regulation of gene expression (e.g., two-component systems and various transcriptional regulators), DNA repair, and homologous recombination, whereas these genes are often lost in symbionts (7). Overall, the genomic features of this gracilibacterium only overlap partially with those of host-associated bacteria, which have experienced rapid genome decay.

Conclusions.Among the most intriguing aspects of the Gracilibacteria genome studied here is the variable nucleotide locus that encodes a conserved tandem PTD tripeptide repeat protein. The gene appears to be under selective pressure to preserve this sequence, as nucleotide variation is localized to this repeat locus almost exclusively as synonymous codons. We infer that the protein has a function strongly tied to the fitness of this organism. The PTD repeat sequence also occurs in coexisting Colwellia that became abundant late in the experiment (6) when the gracilibacterium was detected (see Fig. S6 in the supplemental material). It is unlikely that co-occurrence of the repeat is a coincidence, as this sequence is relatively uncommon, even in public databases. However, we cannot provide a definitive explanation for the shared amino acid repeat sequence in both genomes. If horizontal gene transfer was involved, only the repeat part was transferred, as the remaining sequences do not show any sequence identity. Therefore, we consider it at least equally likely that this phenomenon resulted from convergent evolution, probably with selection for an amino acid sequence with certain adhesion properties.

FIG S6

Taxonomic composition of oil spill simulation samples (6) based on relative abundance of ribosomal protein S3 genes. Abundance of Colwellia with repeat protein is indicated by stars. Abundance of Gracilibacteria is shown in red. Download FIG S6, TIF file, 1.6 MB.
Copyright © 2019 Sieber et al.

This content is distributed under the terms of the Creative Commons Attribution 4.0 International license.

Given this, and its likely function as an extracellular protein potentially involved in attachment, we speculate that (case A) the same repeat sequence in two cell surface proteins should adhere to the same substrate (which seems very reasonable, given that adhesion is mediated by the properties of the amino acid sequences) or (case B) the proteins would adhere to each other at the repeat interface, where molecular Velcro-like binding may occur, as has been shown for other self-associating proteins (25–27). This could result in close proximity in case A or direct cell surface adhesion in case B. As case A seems highly probable based on chemical arguments and case B is less easy to establish, we favored case A in the cell cartoon in Fig. 4. However, this interaction remains speculative and requires enrichment experiments targeting Colwellia to determine if cocultivation with Gracilibacteria can be achieved.

The gracilibacterium studied here is also fascinating in terms of its unusual metabolic platform. Based on its predicted gene inventory, it is inferred to adopt the lifestyle of a scavenger or symbiont of some type (possibly as a parasite). Certainly, it requires an external source of building blocks, including lipids, amino acids, citrate, and malate. In the enrichment experiment designed to simulate the Deepwater Horizon oil spill, glucose-based compounds are not expected to be in high abundance, nor are amino acids. There is no indication that the gracilibacterium can metabolize complex oil-derived compounds. Thus, we predict that the relevant resources are probably bacterial compounds released by cell lysis (e.g., amino acids, small organic molecules, lipids, and cofactors or cofactor precursors) and those that leak from cells of coexisting oil-degrading bacteria (e.g., alcohols and aldehydes). These resources may be processed by this gracilibacterium and the by-products excreted, providing the associated organisms with compounds such as acetyl-CoA, fumarate, succinate, or acetolactate. Based on its inferred lifestyle and its phylogenetic placement within a major distinct clade (Fig. 3), we propose the name Gracilibacteria (phylum), Gracilibacter (class), Detritibacteriales (order), Detritibacteriaceae (family), Detritibacteria (genus) gulfii (species), reflecting its likely dependence on detritus and enrichment in a sample simulating the Gulf oil spill.

MATERIALS AND METHODS

Genome assembly and annotation.The original study of Hu et al. (6) involved seawater samples collected from a depth of 1,100 to 1,200 m in the Gulf of Mexico in 2014. The sample derived from a region impacted by the Deepwater Horizon oil spill in 2010, but there were no oil-spill-derived hydrocarbons detected at the time of sampling. However, hydrocarbon seeps occur naturally in the general area. The in situ cell density was estimated at ∼5.0e + 5 cells/ml. A volume of 630 liters was returned to the surface and amended with unweathered Macondo oil (MASS oil 072610-03) at a concentration of 0.2 ppm to sustain microbial activity and maintained in the dark at 5°C while the sample was transported to the laboratory. In the experiment described previously, samples were incubated for up to 64 days in 2-liter bottles at 4°C in the dark at 0.75 rpm on a rotation carousel system. Macondo crude oil was added to the seawater in 10-μm droplets to final concentrations of 2 ppm and 0.02 ppm Corexit EC9500A dispersant (Nalco). Replicate oil-amended bottles were destructively sampled at 6, 18, and 64 days of incubation for metagenomics.

The methods for the metagenomic assembly of the genome of the BD1-5 described here, as well as the draft Colwellia genome, are reported by Hu et al. (6). In the current study, genome curation was conducted in Geneious (11). Curation involved visualization and validation of paired-read placements throughout. Local assembly errors were identified as regions lacking perfect read support. Gaps were inserted in these regions, and unplaced paired reads used to fill the gaps. In repeat regions, some reads were improperly placed and paired reads were missing. Curation of these regions was similar to that for local assembly errors, except reads had to be relocated manually to achieve the most parsimonious path. The same approach was used to curate the Colwellia genomic region that shared the same repeat sequences. After completion, the assembly was checked for repeats longer than the paired-read distance using a GC skew and cumulative GC skew calculated by previously published methods (28).

Genes of the curated, circularized BD1-5 genome were repredicted using Prodigal (29) with genetic code 25 (-g 25). Functional annotations were done using the ggKbase annotation pipeline (http://ggkbase.berkeley.edu), which searches homologs of predicted genes in the databases of KEGG (30), UniRef (31), and UniProt (32) using USEARCH (33). Amino acid sequences of genes without a significant hit were further annotated using HHblits (34) and the UniProt20 (32) database. In addition, individual genes were interrogated using HHMer (35), HHpred (36), Interproscan (37), Swiss Model (38), and blastp domain analysis. Transmembrane proteins were identified by TMHMM (39). We predicted secreted proteins using psortB (40), signalP (39), and PrediSi (41) with Gram-negative and Gram-positive prediction models, respectively. From the six predictions, we selected proteins that were identified as secreted proteins by at least three different predictions (coming from at least two independent methods). We applied SecretomeP (42) to predict nonclassically secreted proteins without signal peptide. Additionally, we removed proteins with more than one transmembrane domain predicted by TMHMM (43). We predicted transporters with TrSSP (51) and selected proteins with at least four transmembrane domains from the resulting set.

RNA secondary structure within the repeat locus was determined using YASPIN (44), and DNA secondary structure was predicted using MFold (45) for putative stem-loops flanking the BD1-5 repeat gene. Tertiary structure prediction of the BD1-5 repeat protein was performed using I-TASSER (46).

Phylogenetic tree.16S rRNA gene sequences were aligned using SSU-align (47) and trimmed manually. We calculated the phylogenetic tree using the maximum likelihood algorithm RAxML (48) on the CIPRES (49) web server in choosing the GTRGAMMA model and autoMRE to automatically determine the number of bootstraps.

Nucleotide variation and codon usage analysis.We determined single nucleotide variants using VarScan (50), with the following parameters: c10, q30, and fr0.05. This set of nucleotide variants were then assessed to determine nonsynonymous versus synonymous substitutions within each coding region of the BD1-5 genome. For each gene, we determined the number of codon positions corresponding to an amino acid substitution based on genetic code no. 25 (Gracilibacteria), versus those resulting in no amino acid change, counted as either nonsynonymous or synonymous, respectively. A codon usage profile was generated in Python (v.2.7.3) using the Biopython SeqUtils package. Synonymous codon usage was assessed in the repeat-rich gene for comparison with the average codon usage of all genes in the Gracilibacteria genome. Synonymous codons were then compared with predicted tRNA gene anticodons to address potential 5′ anticodon wobble pairing.

Taxonomic composition of oil spill samples.We estimated the relative abundance of taxa in the samples of the oil spill simulation of Hu et al. (6), in mapping reads to contigs with a ribosomal protein S3 gene on them. Annotation of ribosomal proteins and taxonomic classification of contigs were done using using ggKbase (http://ggkbase.berkeley.edu).

Data availability.The genome, with functional annotation, can be accessed at https://ggkbase.berkeley.edu/BD02T64/organisms/60439. The genome sequence has been deposited in GenBank under accession no. CP042461.

ACKNOWLEDGMENTS

Support for this research was provided by the Chan Zuckerberg Biohub (to J.F.B.), the Emerging Technologies Opportunity Program of the U.S. Department of Energy (DOE) Joint Genome Institute, a DOE Office of Science User Facility, supported under contract no. DE-AC02-05CH11231. Support was provided by DOE grant no. DOE-SC10010566 and National Institutes of Health grant no. 5R01AI092531. B.G.P. was supported by the Center for Dark Energy Biosphere Investigations (C-DEBI).

We thank Brian Thomas for bioinformatics support and Spencer Diamond and David Low for helpful suggestions.

The previously reported cultivation experiment was designed by G.A. and P.H., and genome assembly and binning were done by P.H. and C.S. The BD1-5 manual genome curation and repeat locus resolution were performed by J.F.B. Genome-wide inventory and protein localization analyses were done by C.S. Metabolic analyses were conducted by J.F.B., C.C., and C.S., with input from B.P. and D.V. Phylogenetic analysis was carried out by C.S. Codon and repeat region analyses were performed by B.P. Secondary structure analyses were done by B.P. and J.F.B. J.F.B. and C.C. wrote the paper, with input from C.S., B.P., and D.V. All authors read and commented on the paper.

FOOTNOTES

    • Received 29 August 2019
    • Accepted 8 October 2019
    • Published 12 November 2019
  • Copyright © 2019 Sieber et al.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

REFERENCES

  1. 1.↵
    1. Brown CT,
    2. Hug LA,
    3. Thomas BC,
    4. Sharon I,
    5. Castelle CJ,
    6. Singh A,
    7. Wilkins MJ,
    8. Wrighton KC,
    9. Williams KH,
    10. Banfield JF
    . 2015. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523:208–211. doi:10.1038/nature14486.
    OpenUrlCrossRefPubMed
  2. 2.↵
    1. Hug LA,
    2. Baker BJ,
    3. Anantharaman K,
    4. Brown CT,
    5. Probst AJ,
    6. Castelle CJ,
    7. Butterfield CN,
    8. Hernsdorf AW,
    9. Amano Y,
    10. Ise K,
    11. Suzuki Y,
    12. Dudek N,
    13. Relman DA,
    14. Finstad KM,
    15. Amundson R,
    16. Thomas BC,
    17. Banfield JF
    . 2016. A new view of the tree of life. Nat Microbiol 1:16048. doi:10.1038/nmicrobiol.2016.48.
    OpenUrlCrossRef
  3. 3.↵
    1. Wrighton KC,
    2. Thomas BC,
    3. Sharon I,
    4. Miller CS,
    5. Castelle CJ,
    6. VerBerkmoes NC,
    7. Wilkins MJ,
    8. Hettich RL,
    9. Lipton MS,
    10. Williams KH,
    11. Long PE,
    12. Banfield JF
    . 2012. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337:1661–1665. doi:10.1126/science.1224041.
    OpenUrlAbstract/FREE Full Text
  4. 4.↵
    1. Rinke C,
    2. Schwientek P,
    3. Sczyrba A,
    4. Ivanova NN,
    5. Anderson IJ,
    6. Cheng J-F,
    7. Darling A,
    8. Malfatti S,
    9. Swan BK,
    10. Gies EA,
    11. Dodsworth JA,
    12. Hedlund BP,
    13. Tsiamis G,
    14. Sievert SM,
    15. Liu W-T,
    16. Eisen JA,
    17. Hallam SJ,
    18. Kyrpides NC,
    19. Stepanauskas R,
    20. Rubin EM,
    21. Hugenholtz P,
    22. Woyke T
    . 2013. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499:431–437. doi:10.1038/nature12352.
    OpenUrlCrossRefPubMedWeb of Science
  5. 5.↵
    1. Hanke A,
    2. Hamann E,
    3. Sharma R,
    4. Geelhoed JS,
    5. Hargesheimer T,
    6. Kraft B,
    7. Meyer V,
    8. Lenk S,
    9. Osmers H,
    10. Wu R,
    11. Makinwa K,
    12. Hettich RL,
    13. Banfield JF,
    14. Tegetmeyer HE,
    15. Strous M
    . 2014. Recoding of the stop codon UGA to glycine by a BD1-5/SN-2 bacterium and niche partitioning between Alpha- and Gammaproteobacteria in a tidal sediment microbial community naturally selected in a laboratory chemostat. Front Microbiol 5:231. doi:10.3389/fmicb.2014.00231.
    OpenUrlCrossRefPubMed
  6. 6.↵
    1. Hu P,
    2. Dubinsky EA,
    3. Probst AJ,
    4. Wang J,
    5. Sieber CMK,
    6. Tom LM,
    7. Gardinali PR,
    8. Banfield JF,
    9. Atlas RM,
    10. Andersen GL
    . 2017. Simulation of Deepwater Horizon oil plume reveals substrate specialization within a complex community of hydrocarbon degraders. Proc Natl Acad Sci U S A 114:7432–7437. doi:10.1073/pnas.1703424114.
    OpenUrlAbstract/FREE Full Text
  7. 7.↵
    1. Zientz E,
    2. Dandekar T,
    3. Gross R
    . 2004. Metabolic interdependence of obligate intracellular bacteria and their insect hosts. Microbiol Mol Biol Rev 68:745–770. doi:10.1128/MMBR.68.4.745-770.2004.
    OpenUrlAbstract/FREE Full Text
  8. 8.↵
    1. Dick GJ,
    2. Andersson AF,
    3. Baker BJ,
    4. Simmons SL,
    5. Thomas BC,
    6. Yelton AP,
    7. Banfield JF
    . 2009. Community-wide analysis of microbial genome sequence signatures. Genome Biol 10:R85. doi:10.1186/gb-2009-10-8-r85.
    OpenUrlCrossRefPubMed
  9. 9.↵
    1. Ivanova NN,
    2. Schwientek P,
    3. Tripp HJ,
    4. Rinke C,
    5. Pati A,
    6. Huntemann M,
    7. Visel A,
    8. Woyke T,
    9. Kyrpides NC,
    10. Rubin EM
    . 2014. Stop codon reassignments in the wild. Science 344:909–913. doi:10.1126/science.1250691.
    OpenUrlAbstract/FREE Full Text
  10. 10.↵
    1. McCutcheon JP,
    2. McDonald BR,
    3. Moran NA
    . 2009. Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet 5:e1000565. doi:10.1371/journal.pgen.1000565.
    OpenUrlCrossRefPubMed
  11. 11.↵
    1. Kearse M,
    2. Moir R,
    3. Wilson A,
    4. Stones-Havas S,
    5. Cheung M,
    6. Sturrock S,
    7. Buxton S,
    8. Cooper A,
    9. Markowitz S,
    10. Duran C,
    11. Thierer T,
    12. Ashton B,
    13. Meintjes P,
    14. Drummond A
    . 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28:1647–1649. doi:10.1093/bioinformatics/bts199.
    OpenUrlCrossRefPubMedWeb of Science
  12. 12.↵
    1. Tsai C-J,
    2. Sauna ZE,
    3. Kimchi-Sarfaty C,
    4. Ambudkar SV,
    5. Gottesman MM,
    6. Nussinov R
    . 2008. Synonymous mutations and ribosome stalling can lead to altered folding pathways and distinct minima. J Mol Biol 383:281–291. doi:10.1016/j.jmb.2008.08.012.
    OpenUrlCrossRefPubMedWeb of Science
  13. 13.↵
    1. Li G-W,
    2. Oh E,
    3. Weissman JS
    . 2012. The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484:538–541. doi:10.1038/nature10965.
    OpenUrlCrossRefPubMedWeb of Science
  14. 14.↵
    1. Levinson G,
    2. Gutman GA
    . 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221. doi:10.1093/oxfordjournals.molbev.a040442.
    OpenUrlCrossRefPubMedWeb of Science
  15. 15.↵
    1. van der Woude MW
    . 2011. Phase variation: how to create and coordinate population diversity. Curr Opin Microbiol 14:205–211. doi:10.1016/j.mib.2011.01.002.
    OpenUrlCrossRefPubMed
  16. 16.↵
    1. Ellegren H
    . 2002. Mismatch repair and mutational bias in microsatellite DNA. Trends Genet 18:552. doi:10.1016/s0168-9525(02)02804-4.
    OpenUrlCrossRefPubMed
  17. 17.↵
    1. Li Y-C,
    2. Korol AB,
    3. Fahima T,
    4. Nevo E
    . 2004. Microsatellites within genes: structure, function, and evolution. Mol Biol Evol 21:991–1007. doi:10.1093/molbev/msh073.
    OpenUrlCrossRefPubMedWeb of Science
  18. 18.↵
    1. Burstein D,
    2. Sun CL,
    3. Brown CT,
    4. Sharon I,
    5. Anantharaman K,
    6. Probst AJ,
    7. Thomas BC,
    8. Banfield JF
    . 2016. Major bacterial lineages are essentially devoid of CRISPR-Cas viral defence systems. Nat Commun 7:10613. doi:10.1038/ncomms10613.
    OpenUrlCrossRef
  19. 19.↵
    1. Castelle CJ,
    2. Brown CT,
    3. Anantharaman K,
    4. Probst AJ,
    5. Huang RH,
    6. Banfield JF
    . 2018. Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations. Nat Rev Microbiol 16:629–645. doi:10.1038/s41579-018-0076-2.
    OpenUrlCrossRef
  20. 20.↵
    1. Kanao T,
    2. Fukui T,
    3. Atomi H,
    4. Imanaka T
    . 2001. ATP-citrate lyase from the green sulfur bacterium Chlorobium limicola is a heteromeric enzyme composed of two distinct gene products. Eur J Biochem 268:1670–1678. doi:10.1046/j.1432-1327.2001.02034.x.
    OpenUrlCrossRefPubMedWeb of Science
  21. 21.↵
    1. Castelle CJ,
    2. Banfield JF
    . 2018. Major new microbial groups expand diversity and alter our understanding of the tree of life. Cell 172:1181–1197. doi:10.1016/j.cell.2018.02.016.
    OpenUrlCrossRefPubMed
  22. 22.↵
    1. He X,
    2. McLean JS,
    3. Edlund A,
    4. Yooseph S,
    5. Hall AP,
    6. Liu S-Y,
    7. Dorrestein PC,
    8. Esquenazi E,
    9. Hunter RC,
    10. Cheng G,
    11. Nelson KE,
    12. Lux R,
    13. Shi W
    . 2015. Cultivation of a human-associated TM7 phylotype reveals a reduced genome and epibiotic parasitic lifestyle. Proc Natl Acad Sci U S A 112:244–249. doi:10.1073/pnas.1419038112.
    OpenUrlAbstract/FREE Full Text
  23. 23.↵
    1. Kantor RS,
    2. Wrighton KC,
    3. Handley KM,
    4. Sharon I,
    5. Hug LA,
    6. Castelle CJ,
    7. Thomas BC,
    8. Banfield JF
    . 2013. Small genomes and sparse metabolisms of sediment-associated bacteria from four candidate phyla. mBio 4:e00708-13. doi:10.1128/mBio.00708-13.
    OpenUrlAbstract/FREE Full Text
  24. 24.↵
    1. McCutcheon JP,
    2. Moran NA
    . 2010. Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution. Genome Biol Evol 2:708–718. doi:10.1093/gbe/evq055.
    OpenUrlCrossRefPubMed
  25. 25.↵
    1. Heras B,
    2. Totsika M,
    3. Peters KM,
    4. Paxman JJ,
    5. Gee CL,
    6. Jarrott RJ,
    7. Perugini MA,
    8. Whitten AE,
    9. Schembri MA
    . 2014. The antigen 43 structure reveals a molecular Velcro-like mechanism of autotransporter-mediated bacterial clumping. Proc Natl Acad Sci U S A 111:457–462. doi:10.1073/pnas.1311592111.
    OpenUrlAbstract/FREE Full Text
  26. 26.↵
    1. Muiznieks LD,
    2. Keeley FW
    . 2010. Proline periodicity modulates the self-assembly properties of elastin-like polypeptides. J Biol Chem 285:39779–39789. doi:10.1074/jbc.M110.164467.
    OpenUrlAbstract/FREE Full Text
  27. 27.↵
    1. Aguirre KM,
    2. McCormick RJ,
    3. Schwarzbauer JE
    . 1994. Fibronectin self-association is mediated by complementary sites within the amino-terminal one-third of the molecule. J Biol Chem 269:27863–27868.
    OpenUrlAbstract/FREE Full Text
  28. 28.↵
    1. Brown CT,
    2. Olm MR,
    3. Thomas BC,
    4. Banfield JF
    . 2016. Measurement of bacterial replication rates in microbial communities. Nat Biotechnol 34:1256–1263. doi:10.1038/nbt.3704.
    OpenUrlCrossRef
  29. 29.↵
    1. Hyatt D,
    2. LoCascio PF,
    3. Hauser LJ,
    4. Uberbacher EC
    . 2012. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28:2223–2230. doi:10.1093/bioinformatics/bts429.
    OpenUrlCrossRefPubMedWeb of Science
  30. 30.↵
    1. Kanehisa M,
    2. Goto S
    . 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28:27–30. doi:10.1093/nar/28.1.27.
    OpenUrlCrossRefPubMedWeb of Science
  31. 31.↵
    1. Suzek BE,
    2. Huang H,
    3. McGarvey P,
    4. Mazumder R,
    5. Wu CH
    . 2007. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23:1282–1288. doi:10.1093/bioinformatics/btm098.
    OpenUrlCrossRefPubMedWeb of Science
  32. 32.↵
    UniProt Consortium. 2015. UniProt: a hub for protein information. Nucleic Acids Res 43(Database issue):D204–D212. doi:10.1093/nar/gku989.
    OpenUrlCrossRefPubMed
  33. 33.↵
    1. Edgar RC
    . 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461. doi:10.1093/bioinformatics/btq461.
    OpenUrlCrossRefPubMedWeb of Science
  34. 34.↵
    1. Remmert M,
    2. Biegert A,
    3. Hauser A,
    4. Söding J
    . 2011. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173–175. doi:10.1038/nmeth.1818.
    OpenUrlCrossRefPubMed
  35. 35.↵
    1. Eddy SR
    . 2011. Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195. doi:10.1371/journal.pcbi.1002195.
    OpenUrlCrossRefPubMed
  36. 36.↵
    1. Hildebrand A,
    2. Remmert M,
    3. Biegert A,
    4. Söding J
    . 2009. Fast and accurate automatic structure prediction with HHpred. Proteins 77(Suppl 9):128–132. doi:10.1002/prot.22499.
    OpenUrlCrossRefPubMedWeb of Science
  37. 37.↵
    1. Jones P,
    2. Binns D,
    3. Chang H-Y,
    4. Fraser M,
    5. Li W,
    6. McAnulla C,
    7. McWilliam H,
    8. Maslen J,
    9. Mitchell A,
    10. Nuka G,
    11. Pesseat S,
    12. Quinn AF,
    13. Sangrador-Vegas A,
    14. Scheremetjew M,
    15. Yong S-Y,
    16. Lopez R,
    17. Hunter S
    . 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi:10.1093/bioinformatics/btu031.
    OpenUrlCrossRefPubMedWeb of Science
  38. 38.↵
    1. Bienert S,
    2. Waterhouse A,
    3. de Beer TAP,
    4. Tauriello G,
    5. Studer G,
    6. Bordoli L,
    7. Schwede T
    . 2017. The SWISS-MODEL Repository—new features and functionality. Nucleic Acids Res 45:D313–D319. doi:10.1093/nar/gkw1132.
    OpenUrlCrossRefPubMed
  39. 39.↵
    1. Almagro Armenteros JJ,
    2. Tsirigos KD,
    3. Sønderby CK,
    4. Petersen TN,
    5. Winther O,
    6. Brunak S,
    7. von Heijne G,
    8. Nielsen H
    . 2019. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol 37:420–423. doi:10.1038/s41587-019-0036-z.
    OpenUrlCrossRefPubMed
  40. 40.↵
    1. Yu NY,
    2. Wagner JR,
    3. Laird MR,
    4. Melli G,
    5. Rey S,
    6. Lo R,
    7. Dao P,
    8. Sahinalp SC,
    9. Ester M,
    10. Foster LJ,
    11. Brinkman F
    . 2010. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26:1608–1615. doi:10.1093/bioinformatics/btq249.
    OpenUrlCrossRefPubMedWeb of Science
  41. 41.↵
    1. Hiller K,
    2. Grote A,
    3. Scheer M,
    4. Münch R,
    5. Jahn D
    . 2004. PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res 32(Web server issue):W375–W379. doi:10.1093/nar/gkh378.
    OpenUrlCrossRefPubMedWeb of Science
  42. 42.↵
    1. Bendtsen JD,
    2. Jensen LJ,
    3. Blom N,
    4. Von Heijne G,
    5. Brunak S
    . 2004. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17:349–356. doi:10.1093/protein/gzh037.
    OpenUrlCrossRefPubMedWeb of Science
  43. 43.↵
    1. Krogh A,
    2. Larsson B,
    3. von Heijne G,
    4. Sonnhammer E
    . 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580. doi:10.1006/jmbi.2000.4315.
    OpenUrlCrossRefPubMedWeb of Science
  44. 44.↵
    1. Lin K,
    2. Simossis VA,
    3. Taylor WR,
    4. Heringa J
    . 2005. A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–159. doi:10.1093/bioinformatics/bth487.
    OpenUrlCrossRefPubMedWeb of Science
  45. 45.↵
    1. Zuker M
    . 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415. doi:10.1093/nar/gkg595.
    OpenUrlCrossRefPubMedWeb of Science
  46. 46.↵
    1. Yang J,
    2. Yan R,
    3. Roy A,
    4. Xu D,
    5. Poisson J,
    6. Zhang Y
    . 2015. The I-TASSER Suite: protein structure and function prediction. Nat Methods 12:7–8. doi:10.1038/nmeth.3213.
    OpenUrlCrossRefPubMed
  47. 47.↵
    1. Nawrocki EP
    . 2009. Structural RNA homology search and alignment using covariance models. PhD thesis. Washington University, St. Louis, MO.
  48. 48.↵
    1. Stamatakis A
    . 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. doi:10.1093/bioinformatics/btu033.
    OpenUrlCrossRefPubMedWeb of Science
  49. 49.↵
    1. Miller MA,
    2. Pfeiffer W,
    3. Schwartz T
    . 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees, p 1–8. In Gateway Computing Environments Workshop (GCE 2010), New Orleans, LA, 14 November 2010.
  50. 50.↵
    1. Koboldt DC,
    2. Chen K,
    3. Wylie T,
    4. Larson DE,
    5. McLellan MD,
    6. Mardis ER,
    7. Weinstock GM,
    8. Wilson RK,
    9. Ding L
    . 2009. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25:2283–2285. doi:10.1093/bioinformatics/btp373.
    OpenUrlCrossRefPubMedWeb of Science
  51. 51.↵
    1. Mishra NK,
    2. Chang J,
    3. Zhao PX
    . 2014. Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PLoS One 9:e100278. doi:10.1371/journal.pone.0100278.
    OpenUrlCrossRefPubMed
View Abstract
PreviousNext
Back to top
Download PDF
Citation Tools
Unusual Metabolism and Hypervariation in the Genome of a Gracilibacterium (BD1-5) from an Oil-Degrading Community
Christian M. K. Sieber, Blair G. Paul, Cindy J. Castelle, Ping Hu, Susannah G. Tringe, David L. Valentine, Gary L. Andersen, Jillian F. Banfield
mBio Nov 2019, 10 (6) e02128-19; DOI: 10.1128/mBio.02128-19

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Print

Alerts
Sign In to Email Alerts with your Email Address
Email

Thank you for sharing this mBio article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Unusual Metabolism and Hypervariation in the Genome of a Gracilibacterium (BD1-5) from an Oil-Degrading Community
(Your Name) has forwarded a page to you from mBio
(Your Name) thought you would be interested in this article in mBio.
Share
Unusual Metabolism and Hypervariation in the Genome of a Gracilibacterium (BD1-5) from an Oil-Degrading Community
Christian M. K. Sieber, Blair G. Paul, Cindy J. Castelle, Ping Hu, Susannah G. Tringe, David L. Valentine, Gary L. Andersen, Jillian F. Banfield
mBio Nov 2019, 10 (6) e02128-19; DOI: 10.1128/mBio.02128-19
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Top
  • Article
    • ABSTRACT
    • INTRODUCTION
    • RESULTS AND DISCUSSION
    • MATERIALS AND METHODS
    • ACKNOWLEDGMENTS
    • FOOTNOTES
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

KEYWORDS

BD1-5
CPR
candidate phyla radiation
genomes from metagenomes
gracilibacteria
surface proteins

Related Articles

Cited By...

About

  • About mBio
  • Editor in Chief
  • Board of Editors
  • AAM Fellows
  • Policies
  • For Reviewers
  • For the Media
  • For Librarians
  • For Advertisers
  • Alerts
  • RSS
  • FAQ
  • Permissions
  • Journal Announcements

Authors

  • ASM Author Center
  • Submit a Manuscript
  • Author Warranty
  • Article Types
  • Ethics
  • Contact Us

Follow #mBio

@ASMicrobiology

       

ASM Journals

ASM journals are the most prominent publications in the field, delivering up-to-date and authoritative coverage of both basic and clinical microbiology.

About ASM | Contact Us | Press Room

 

ASM is a member of

Scientific Society Publisher Alliance

Copyright © 2019 American Society for Microbiology | Privacy Policy | Website feedback

Online ISSN: 2150-7511