RNA-Dependent Cysteine Biosynthesis in Bacteria and Archaea

ABSTRACT The diversity of the genetic code systems used by microbes on earth is yet to be elucidated. It is known that certain methanogenic archaea employ an alternative system for cysteine (Cys) biosynthesis and encoding; tRNACys is first acylated with phosphoserine (Sep) by O-phosphoseryl-tRNA synthetase (SepRS) and then converted to Cys-tRNACys by Sep-tRNA:Cys-tRNA synthase (SepCysS). In this study, we searched all genomic and metagenomic protein sequence data in the Integrated Microbial Genomes (IMG) system and at the NCBI to reveal new clades of SepRS and SepCysS proteins belonging to diverse archaea in the four major groups (DPANN, Euryarchaeota, TACK, and Asgard) and two groups of bacteria (“Candidatus Parcubacteria” and Chloroflexi). Bacterial SepRS and SepCysS charged bacterial tRNACys species with cysteine in vitro. Homologs of SepCysE, a scaffold protein facilitating SepRS⋅SepCysS complex assembly in Euryarchaeota class I methanogens, are found in a few groups of TACK and Asgard archaea, whereas the C-terminally truncated homologs exist fused or genetically coupled with diverse SepCysS species. Investigation of the selenocysteine (Sec)- and pyrrolysine (Pyl)-utilizing traits in SepRS-utilizing archaea and bacteria revealed that the archaea carrying full-length SepCysE employ Sec and that SepRS is often found in Pyl-utilizing archaea and Chloroflexi bacteria. We discuss possible contributions of the SepRS-SepCysS system for sulfur assimilation, methanogenesis, and other metabolic processes requiring large amounts of iron-sulfur enzymes or Pyl-containing enzymes.

T wo minor genetic code systems were discovered in methanogenic archaea a decade ago (1)(2)(3). In most organisms, Cys biosynthesis and Cys-tRNA Cys formation are carried out separately by a cysteine synthase and cysteinyl-tRNA synthetase (CysRS), respectively. However, methanogens employ a tRNA Cys -dependent Cys biosynthesis pathway (3). In these archaea, Cys-tRNA Cys is formed in a two-step process; first, O-phosphoserine (Sep) is acylated to tRNA Cys by SepRS and then Sep-tRNA Cys is converted by Sep-tRNA:Cys-tRNA synthase (SepCysS) to Cys-tRNA Cys (3)(4)(5)(6). An additional component, SepCysE, stabilizes the SepRS·SepCysS·tRNA Cys ternary complex, but it is known to be present in class I methanogens only (4,7). The class I methanogens are also exceptional among methanogens in that they encode selenocysteine (Sec), the 21st genetically encoded amino acid used in some archaea and many bacteria and eukaryotes (8,9). The coupled biosynthesis and coding of Cys are considered as the original mechanism of Cys-tRNA Cys formation in the last common ancestor of archaea (3,4) because archaeal CysRS genes appear to have multiple bacterial origins (10,11) and bacterial CysRS is a highly evolved Cys-specific enzyme using a zinc atom to ensure specificity (12,13). However, our knowledge is confined to well-studied lineages of cultured archaea, and it remains unclear whether the SepRS-SepCysS pathway is present outside the major Euryarchaeota clade, which includes class I, II, and III methanogens, methanotrophic archaea 1 (ANME-1), and Archaeoglobi (4,(14)(15)(16).
Pyrrolysine (Pyl), the 22nd genetically encoded amino acid, is charged to tRNA Pyl by pyrrolysyl-tRNA synthetase (PylRS) (17,18), which is specific for this unusual amino acid. PylRS is present in diverse bacteria and a few archaeal groups (19). PylRS is encoded by a single pylS gene in the Methanosarcinaceae, by the pylSn and pylSc gene, encoding the N-or C-terminal part, respectively, of PylRS in some anaerobic bacteria and "Candidatus Methanomethylicus sp. V1," or by pylSc only in Methanomassiliicoccales (1,(19)(20)(21). The evolutionary pathways of the three types of PylRS remain unclear (19). Pyrrolysine biosynthesis genes (pylBCD), a tRNA Pyl gene (pylT), and Pyl-utilizing methylamine methyltransferase genes (mtxBC) usually form a single gene cluster with the PylRS gene, which may have facilitated the horizontal gene transfer (HGT) of a Pylencoding system (22). The Pyl-utilizing methylamine:corrinoid methyltransferases (MtxB) transfer a methyl group from methylamines to their corrinoid protein partners (MtxC). The methyl group is then transferred to coenzyme M (CoM) in methanogens and possibly to CoM or tetrahydrofolate (THF) in bacteria (8,23). Finally, the methyl group is released as methane by methyl-CoM reductase in methanogens and probably fuels anaerobic respiration in bacteria (8,23).
In the last few years, analyses of genomic and metagenomic sequences have identified large numbers of novel bacterial and archaeal lineages. Some of these archaea are methanogens (21,(24)(25)(26)(27). Most importantly, single-cell genomics and the composite-genome approach have dissected microbial dark matter (MDM) (28), the candidate phylum radiation (CPR) (29,30), and the Asgard archaeal superphylum (31) by detecting and classifying uncultivated microbes (28,29,(31)(32)(33)(34). Progress in DNA sequence and de novo assembly technologies have led to the generation of larger genomic and metagenomic contigs encoding proteins. Phylogenetic studies of organisms based on protein sequences challenge traditional phylogenies based solely on rRNA sequences (29).
In this study, we assumed that the SepRS-SepCysS-SepCysE system might exist in diverse organisms whose genomic sequences were not available several years ago. We addressed (i) the distribution of the genes for SepRS, SepCysS, and SepCysE homologs outside the major Euryarchaeota groups and (ii) any relationships between RNAdependent Cys biosynthesis and Sec-or Pyl-utilizing traits. In addition to investigating the genomic data, we investigated metagenomic protein sequence data, whose usage has been limited due to the low reliability of the data and the difficulty of inferring firm phylogenetic results. To overcome these problems, we performed a comprehensive survey of all metagenomic data sets in the IMG system (35) and at the NCBI rather than using an individual data set.     by our contig binning. They may belong to the rapidly evolving groups of Euryarchaeota ( Fig. 1A) (27,43), which explains the extent of divergence.
(ii) Occurrence of SepCysS. Classification of the collected SepCysS sequences revealed that, in addition to the three known SepCysS clades (here, clades I, III, and VII) (10), four other clades exist (here, clades II, IV, V, and VI) ( Fig. 1B; Fig. S1). Residues critical for the Sep-to-Cys conversion are conserved in all SepCysS species, with only two exceptions. While residues involved in pyridoxal-phosphate (PLP) binding are conserved in all collected SepCysSs, the Cys residues involved in persulfide formation (6) are missing in these two cases, thereby suggesting that these SepCysS proteins might employ a different mechanism for sulfur transfer (see below).
SepCysS phylogeny shows a very low correlation with the SepRS phylogeny. There are two plausible explanations for it. (i) Some archaea have two copies of SepCysS genes (of the same clade or different clades) that are shared within the same subgroup of archaea ( Fig. 1B; Fig. S1 and S2) (10,44). Likewise, it is possible that a second SepCysS gene copy was excluded from our analysis due to incomplete genome sequencing and contig binning. Importantly, in our genome and metagenome analyses, no additional copies of SepRS genes were identified, nor were any SepCysS genes found in the complete genomes lacking SepRS. (ii) Because SepCysS shows less tRNA specificity than SepRS (45), SepCysS genes may be more prone to HGT than SepRS genes. The occurrence of a SepCysS gene duplication in some Euryarchaeota methanogens ( Fig. S1) implies that an additional gene copy may enhance RNA-dependent cysteine biosynthesis under certain conditions.
Some archaeal genomes contain homologs of the N-terminal helix-turn-helix domain of SepCysE ( Fig. 1B and D). This homolog is present as an additional domain fused to SepCysS (some of the clade VII SepCysSs) or encoded as a split gene in front of clade SepCysS genes (clade VII SepCysSs and a few clade I and VI SepCysS genes) ( Fig. 1B and  D; Fig. S1). This SepCysE homolog was named "SepCysSn" when encoded by a separate gene or "SepCysSN" when fused to SepCysS (Fig. 1D).
The genetic loci of SepRS and SepCysS. The genetic loci and genes accompanying SepRS and SepCysS genes support the protein sequence-based phylogenies (Fig. S2). (i) Bacteria (and an archaeon) share the SepRS-SepCysS operon. In two cases, bacterial tRNA Cys High-Resolution View of Genetic Code Evolution ® is found in an operon with either SepRS-ΔC or the second copy of SepCysS (Fig. S2). (ii) "Ca. Bathyarchaeota" archaeon BA2 has a compact operon encoding tRNA Cys , Sep-CysSn, SepCysS, and SepRS. This operon is widespread among marine sediment archaea, possibly because it is so amenable to HGT. (iii) Clade VII SepCysS genes are often associated with a tRNA Cys gene, a few sulfur metabolism genes, and a small gene which was annotated to encode tRNA-Thr-editing domain (ED). As shown in Fig. S4, tRNA-Thr-ED is a homolog of the editing domain of archaeal threonyl-tRNA synthetase (ThrRS-R) (51) and the editing domain of archaeal transediting ThrRS-ED protein (52) (see Text S1 in the supplemental material). The tRNA-Thr-ED proteins of Euryarchaeota methanogens form a clade distinct from those of MSBL1/Hadesarchaea (Fig. S4B and C). The SepCysS proteins associated with these tRNA-Thr-ED proteins are distributed in the same manner (methanogens' clade and MSBL1/Hadesarchaea clade) in our clade VII SepCysS phylogeny (Fig. 1B).
Idiosyncrasies of bacterial SepRS and SepCysS. Bacterial SepRSs are highly distant from the well-studied methanogen SepRS gene (Fig. 1A). Bacterial SepCysSs, on the other hand, have a close evolutionary relationship with methanogens' SepCysSs (Fig. 1B). Therefore, bacterial SepRS and SepCysS genes may have different archaeal origins and formed an operon after the branching of class I and class II and III methanogens. It is apparent from the multiple alignments of SepRS sequences that bacterial SepRS lacks a small motif involved in archaeal tRNA Cys recognition (53, 54) ( Fig. 2A). As this motif binds methylated guanosine 37, an identity determinant in methanogen SepRS systems (53)(54)(55), it appears that the N 1 -methyl modification of G37 does not contribute to bacterial SepRS·tRNA Cys recognition. This is consistent with the fact that bacteria lack methyltransferase Trm5, which catalyzes m 1 G37 formation in archaeal tRNA Cys species (54,(56)(57)(58). Our structural models of bacterial SepRSs based on the Archaeoglobus fulgidus SepRS·tRNA Cys (PDB accession no. 2du3) crystal structure (59) show that a hydrophilic residue (mostly Asp) replaces the hydrophobic Ile444 within the enzyme's anticodon binding domain. In A. fulgidus and methanogen SepRSs, Ile444 might be involved in m 1 G37 recognition ( Fig. 2A). In addition, the vicinal helix of Ile444 is replaced with a short loop in bacterial SepRSs ( Fig. 2A).
(ii) Bacterial SepCysSs catalyze the tRNA-dependent Sep-to-Cys conversion. Although the exact mechanism of the SepCysS-catalyzed reaction has not yet been fully elucidated, the Sep-to-Cys conversion most likely proceeds through a PLP-dependent generation of a dehydroalanyl-tRNA Cys intermediate, which is subsequently attacked by a persulfide group to form Cys-tRNA Cys (60,61). Both "Ca. Parcubacteria" and Chloroflexi SepCysS possess residues involved in PLP binding (59), while only Chloroflexi SepCysS harbors conserved Cys residues implicated in persulfide (60,61) and FeϪS cluster formation (6). "Ca. Parcubacteria" SepCysS is one of the two exceptional SepCysSs that lacks 2 out of 3 conserved cysteines (see above).
In analyzing the metagenomic data sets, we focused on particular metagenomes because a whole Pyl-encoding gene cluster is rarely contained in a single metagenomic contig, which hampers the phylogenetic inference of PylRS genes. We chose data sets from deep marine and hot spring environments because of the abundance of archaeal species in these niches, the presence of SepRS genes, and high-quality data that are provided by Microbial Dark Matter, phase II. Invaluable information was obtained from the metagenome of a deep-oceanic, basalt-hosted subsurface ecosystem from Juan de Fuca Ridge flank, Pacific Ocean (CORK borehole 1362A_J2.573). Three dominant archaeal species, Methermicoccaceae, marine benthic group E (MBG-E), and "Ca. Bathyarchaeota" (64), possess both SepRS and PylRS genes as well as the genes for Pyl-utilizing MtmB/MtbB/MttB enzymes (Fig. 3). The PylRS species of the Methermicoccaceae and MBG-E archaea divide the bacterial PylRS clade into two (Acetohalobium and others) (Fig. 3), indicating the occurrence of horizontal gene transfer of a Pyl-encoding system between bacteria and archaea (65). The "Ca. Bathyarchaeota" PylRS (PylSc) forms a new PylRS clade (Fig. 3) together with the V1 PylSc and some PylSc species found in the Deep Marine Sediments White Oak River (WOR) estuary metagenomes (data not shown). Pyl-encoding systems are also present in the hot spring metagenomes, although their metagenomic bins are less reliable due to the complex composition of the prokaryotic communities (Fig. 3). In the sulfidic Washburn Spring metagenome, one Archaeoglobus-type and several Crenarchaeotatype SepRS genes were also found. Thus, it is tempting to assume that a few subgroups of Archaeoglobus and TMCG possess both SepRS and PylRS.

DISCUSSION
In this study, we searched all the genomic and metagenomic protein sequence data in the public databases for the RNA-dependent cysteine biosynthesis pathway. Previous studies used only genome sequences and a particular metagenome sequence datum to search for a particular aminoacyl-tRNA synthesis system, in part due to the low reliability and accessibility of metagenomic sequence data. We encountered a similar problem with the "Ca. Parcubacteria" DG_74_2 bin, which is apparently composed of a few different genomes, including two "Ca. Parcubacteria" species. Our contig binning was greatly facilitated by the fact that minor genetic code systems rely on multiple components, which are, in turn, frequently dispersed on different metagenomic contigs. This approach eventually led us to the detection of rRNA and protein genes useful for phylogenetic inference. This work and other recent studies (66-68) will lead future studies of gene evolution in uncultured microbes.
Our phylogenetic analyses demonstrate that (i) the well-investigated class I/II/III methanogen and Archaeoglobi SepRSs constitute only a terminal branch of one of the clades, (ii) the TRAX-SepRS genes, SepCysE, the Sec-encoding system, and the four selenoproteins are shared by Euryarchaeota and TACK/Asgard, (iii) a few groups of proteins accompany SepCysS genes within the genetic loci, (iv) new PylRS types occur in nature and represent a missing link between the three known clades of PylRS, and (v) modern archaea may have fused the adapter peptides PylSn and SepCysSn to PylSc and SepCysS, respectively. In addition, our biochemical analyses confirm that bacterial SepRS and SepCysS species from uncultured "Ca. Parcubacteria" and Chloroflexi bacteria possess canonical activity. It is still not clear whether the SepRS system was present in the last common archaeal ancestor (3,4), because the SepRS system was rarely found in the DPANN group, which was predicted (29) to have diverged first among archaea. It is also unclear why the SepRS system is absent or sparsely distributed in many branches of archaea. Was it gradually replaced by CysRS in each branch or horizontally transferred from another branch?
There may be diverse mechanisms for RNA-dependent cysteine biosynthesis in nature. The composite genome of a bacterium, ADurb.Bin236 (BioSample accession no. SAMN05004151), encodes a noncanonical SepRS homolog (GenBank accession no. OQA87054.1) and a SepCysSn-SepCysS operon (GenBank accession no. OQA83877.1 and OQA83876.1). Their protein sequences are highly diverged and may have archaeal origins. Surprisingly, this SepRS homolog has an additional N-terminal domain corresponding to the serine-editing domain of archaeal ThrRS (ThrRS-R). This is consistent with the genetic coupling of some clade VII SepCysS and tRNA-Thr-ED genes in Euryarchaeota. Although further study and validation are required, one may hypothesize that some SepRS species might possess a serine-editing activity in cis or in trans, because dephosphorylation of Sep-tRNA Cys produces Ser-tRNA Cys , which may translate cysteine codons as serine.
The presence of the SepRS and SepCysS system may correlate with a high demand for iron-sulfur proteins, one of which is SepCysS (6), in obligate anaerobes for methanogenesis and for other metabolisms (14,69,70). For example, organohalide respiration in Dehalococcoides relies on iron-sulfur proteins (71). The coexisting tendency of PylRS and SepRS in archaea and bacteria may be partially explained by the facts that methylamine metabolism by Pyl-utilizing enzymes requires an iron-sulfur protein, RamA (8,72), and that methylornithine synthase (PylB) is an iron-sulfur enzyme (73). Apart from assisting iron-sulfur proteins, the SepRS system may be useful for extreme thermophiles, because free phosphoserine is stable even at an extremely high temperature (74).
It has been shown that genes characteristic of methanogen-type sulfur assimilation and mobilization exist in some deltaproteobacteria and Chloroflexi (75). These genes encode proteins involved in methanogen-idiosyncratic homocysteine synthesis and facilitate growth when sulfide is provided as the sole sulfur source (5,61,75). As predicted, these genes cooccur with SepCysS and are present in SepRS-carrying Chloroflexi Dehalococcoidia bacterium CG2_30_46_9, Dehalococcoidia bacterium CG2_30_46_19, and Chloroflexi bacterium RBG_13_51_36 (see Fig. S1 in the supplemental material). Because of the vast abundance of Chloroflexi in deep sediments, their metabolic traits have a direct impact on sulfur cycling within the marine subsurface.

Bioinformatics.
A BLAST search was performed by using three public Web servers, JGI IMG/MER (35), NCBI BLAST, and NCBI SRA BLAST. Some of the SepCysSn and tRNA sequences were manually identified. The SepRS sequences of "Ca. Verstraetearchaeota" were obtained by using tBLASTn from accession no. PRJNA321438. Multiple-alignment analyses of protein sequences were performed using Clustal X 2.1 (40), followed by manual curation based on the reported structure-based alignment analyses of SepRS (59), SepCysS (44), SepCysE (4), the ThrRS editing domain (76), and PylRS (77) using SeaView (78). The phylogeny reconstruction analyses of the alignment files were performed by using MEGA 7 (79) with the default settings (maximum likelihood, Jones-Taylor-Thornton [JTT] model, uniform rates, use all gaps/ missing sites). Protein structure models were made with PyMol 1.7.6.0 (Schrödinger, LLC). The sequence and alignment data used in this study are provided in the supplemental material (see Data Set S1).
Binning of metagenomic contigs was performed based on GC contents and read depths. Some of the WOR metagenomic contigs lack the read depth information. For the binning of metagenomic contigs of the AK8/W8A-19 group archaea, contaminating "Ca. Bathyarchaeota" contigs were removed. For the binning of metagenomic contigs of the BOG (Asgard) archaeon, each contig was confirmed to harbor Asgard-like protein genes, and contaminating Methanocella contigs were removed. The automatic annotation pipelines of the NCBI and JGI databases and our manual annotation/curation identified or predicted the host archaea and bacteria of these metagenomic contigs (Table S1). Our 16S rRNA phylogeny revealed that unclassified "LHC4-2-B" archaea JGI MDM2 LHC4sed-1-M8 and N8 belong to the pSL50 group and that unclassified "LHC4-2-B" archaeon JGI MDM2 LHC4sed-1-M18 belongs to the pJP 33 group. It was revealed that W8A-19 archaea, which were annotated to belong to the Korarchaeota (46), have ribosomal protein operons very similar to those of AK8 archaea (Fig. S3).
In vitro and in vivo assays of bacterial SepRS and SepCysS. Assays were performed using traditional methods (45,80,81). Detailed materials and methods are provided in Text S2 in the supplemental material.