A New Family of Capsule Polymerases Generates Teichoic Acid-Like Capsule Polymers in Gram-Negative Pathogens

ABSTRACT Group 2 capsule polymers represent crucial virulence factors of Gram-negative pathogenic bacteria. They are synthesized by enzymes called capsule polymerases. In this report, we describe a new family of polymerases that combine glycosyltransferase and hexose- and polyol-phosphate transferase activity to generate complex poly(oligosaccharide phosphate) and poly(glycosylpolyol phosphate) polymers, the latter of which display similarity to wall teichoic acid (WTA), a cell wall component of Gram-positive bacteria. Using modeling and multiple-sequence alignment, we showed homology between the predicted polymerase domains and WTA type I biosynthesis enzymes, creating a link between Gram-negative and Gram-positive cell wall biosynthesis processes. The polymerases of the new family are highly abundant and found in a variety of capsule-expressing pathogens such as Neisseria meningitidis, Actinobacillus pleuropneumoniae, Haemophilus influenzae, Bibersteinia trehalosi, and Escherichia coli with both human and animal hosts. Five representative candidates were purified, their activities were confirmed using nuclear magnetic resonance (NMR) spectroscopy, and their predicted folds were validated by site-directed mutagenesis.

On the basis of the genetic and chemical properties of different E. coli strains, capsules are divided into four groups (4). Group 2 capsules consist of linear polymers displaying a high negative-charge density that is introduced by either negatively charged sugar residues (sialic acid or glucuronic acid) or phosphate groups (1).
Genes required for biosynthesis and export of group 2 capsules are located in the so-called capsule gene cluster (1,(4)(5)(6)(7)(8)(9)(10). It is structured as three regions, of which regions 1 and 3 are conserved and encode proteins responsible for initiating capsule biosynthesis and transporting the polymers to the cell surface (see Fig. S1 in the supplemental material). Region 2 contains the serogroup-specific capsule polymerases (referred to here as "polymerases") that assemble the respective capsule polymers (Fig. 1a). (Capsule polymers are often referred to as Љcapsule polysaccharides,Љ especially when they consist exclusively of saccharide units.) The polymerases described so far can be classified on the basis of their catalytic activity as glycosyltransferases (GTs) and hexose-1-phosphate transferases. Polymerases with GT activity generate glycosidic linkages in polymers consisting exclusively of saccharide units. On the structural level, the catalytic domains of these enzymes adopt one of the two most abundant GT folds, i.e., either GT-A or GT-B. The characteristic elements of both folds are two Rossmann-like domains that are either tightly associated, forming a central, continuous ␤-sheet (GT-A), or opposed to each other, forming a deep cleft that contains the catalytic center (GT-B) (11,12). Polymerases with GT activity are either single-domain enzymes acting alone to synthesize a homopolymer (13)(14)(15)(16) or, when generating a heteropolymer, acting in concert with another domain of the same GT fold type, which is either independently expressed (1,17) or part of the same polypeptide (18)(19)(20)(21).
Polymerases with hexose-1-phosphate transferase activity assemble a polymer in which monosaccharides are bridged by phosphodiester linkages (22)(23)(24). No structural and little mechanistic data (25)(26)(27) are available for these polymerases yet, but they are believed to be single-domain enzymes displaying sequence motifs that are characteristic of the members of a protein family called stealth (28).
With enzyme CslB of N. meningitidis serogroup L, we recently described for the first time a polymerase which is able to introduce both linkage types into one polymer, generating a structure consisting of trisaccharides connected by phosphodiester linkages (29). Despite its phosphotransferase activity, CslB shows no similarity to the hexose-1-phosphate transferases of the members of the stealth protein family. Instead, it exhibits a so far unique bipartite architecture consisting of an N-terminal GT-A fold and a C-terminal GT-B-like fold (29).
In the current study, we searched for homologues of CslB and identified a set of multidomain polymerases in a variety of animal and human pathogens. One subset of these polymerases was found to synthesize, like CslB, a poly(oligosaccharide phosphate) capsule. A second subset, however, was shown to assemble a poly(glycosylpolyol phosphate) polymer that displays considerable similarity to wall teichoic acid (WTA) type II, a negatively charged polymer and major constituent of the Gram-positive cell envelope. (Note that the terms ЉpolyolЉ and ЉalditolЉ are often used synonymously in literature describing WTA.) In line with this finding, bioinformatics analysis of all identified primary protein sequences revealed as a common denominator a domain that showed sequence similarity to TagF, the wall teichoic acid synthase from Staphylococcus epidermidis and the best-characterized member of the so-called TagF-like protein family. Using site-directed mutagenesis, highperformance liquid chromatography (HPLC), and nuclear magnetic resonance (NMR) spectroscopy, we confirmed the modeling data and the hypothesized enzymatic activity.

RESULTS
Identification of the TagF-like polymerase family. It is known that glycosyltransferases with similar activities can show low sequence identity (30,31). Consequently, we searched for homologues of CslB not only by performing protein BLAST (32) searches using the CslB amino acid sequence (AEQ62070.1) but also by manually searching the literature for complex, phosphate-containing group 2 capsules with structures similar to that of the poly(oligosaccharide phosphate) polymer expressed by N. meningitidis serogroup L ( Fig. 1b; see also Fig. S2 in the supplemental material). The best score (99% query cover, 58% sequence identity) in the protein BLAST search was obtained for putative glycosyltransferase Cps12B (AAS77491.1) from A. pleuropneumoniae serotype 12, a pathogen expressing a capsule identical to the polymer generated by CslB (29,33). Also among the hits was Ccs2 (AEC50903.1; 97% query cover, 33% sequence identity) from H. influenzae serotype c, which expresses a capsule consisting of disaccharide repeating units connected through phosphodiester linkages (34). Literature searches revealed that similar dimeric units are expressed by A. pleuropneumoniae serotypes 1 (35) and 4 (36) and by H. influenzae serotype f (37,38). Moreover, in a multitude of bacterial species, one of the hexoses of these dimeric repeating units is replaced by a glycerol, creating a structure similar to that of WTA type II (39,40) [49]) serotypes T3 (50), T4 (51), and T15 (52). Following the hypothesis that all of the linear polymer backbones mentioned above are generated by homologues of CslB (note that modifications of the linear backbone as shown in Fig. 1b [see also Fig. S2] are usually introduced by separate enzymes [1]), we analyzed the DNA sequence information from the corresponding strains and identified putative polymerases in the capsule gene cluster by their unusual length of more than 2,500 bp per open reading frame (ORF) ( Fig. 1a; see also Fig. S1). Homology modeling, performed for each polypeptide sequence using the structure prediction software PHYRE2 (53), revealed that all putative polymerases, like CslB, contained as a common denominator a domain that was modeled with 100% confidence onto the crystal structure of TagF, the wall teichoic acid (type I) synthase of the Gram-positive bacterium Staphylococcus epidermidis. The corresponding domain is referred to here as TagF-like domain (see Fig. 1c to e; see also  (54). TagF itself is the most extensively studied member of the TagF-like protein family, which is characterized by five active-site primary sequence motifs (54,55) and has so far been biochemically investigated using only Gram-positive sources. Despite their Gram-negative origin and the highly differing percentages of sequence identity with respect to their TagF-like domains (15% to 93%; see Fig. S4), all polymerases identified here contained the five active-site motifs as shown by multiple-sequence alignment using the Clustal Omega algorithm (56) (Fig. S5). It thus seems reasonable to allocate them to the TagF-like protein family and classify them as group 2 TagF-like polymerases.
In addition to the TagF-like domain, all models were predicted to have either a GT-A folded domain at the N terminus or a GT-B folded domain C-terminally flanking the TagF-like domain (Fig. 1c to e; see also Fig. S3). It is of note that the GT-A fold, modeled onto the ␤-glycosyltransferase domain of capsule polymerase K4CP from E. coli K4 (57), was present only in strains expressing a capsule with ␤-glycosidic linkage, whereas the GT-B fold, modeled onto the teichoic acid modifying ␣-glycosyltransferase TarM from Staphylococcus aureus (58), was found only in strains expressing a polymer with ␣-glycosidic linkage (Fig. 1b to e; domains and residues are highlighted in magenta and violet). Despite the considerably differing percentages of sequence identity (30% to 70% for GT-A and 40% to 90% for GT-B; see Fig. S6), all GT domains were modeled by PHYRE2 with 100% confidence. The two resulting archetypal polymerase architectures are referred to here as GT-A/TagF-like and TagF-like/GT-B architectures (see Fig. 1c and d).

Functional testing of recombinant capsule polymerases.
To corroborate the predicted activity of the TagF-like polymerase family, five representative candidates were selected for further analyses. Cps12B was chosen due to its high similarity to CslB (29) (Fig. 1b and c). Cps1B and Ccs2 were additionally selected from the GT-A/TagF-like subgroup since they generate dimeric instead of trimeric repeating units ( Fig. 1b and  c), and Cps3D and Cps7D were selected from the TagF-like/GT-B subgroup as candidates synthesizing WTA-like products ( Fig. 1b and d). The ORFs coding for the putative polymerases were amplified from bacterial lysates, cloned with different N-and C-terminal tags (maltose binding protein [MBP] and/or hexahistidine [His 6 ], respectively), and expressed in E. coli. For each polymerase, the constructs yielding the best expression levels (namely, MBP-Cps12B-His 6 , MBP-Cps1B-His 6 , MBP-Cps3D-His 6 , MBP-Cps7D-His 6 , and Ccs2-His 6 ; for concise presentation of the data, all constructs are displayed here without tags) were purified by affinity chromatography and subsequent size exclusion chromatography (SEC) (Fig. S7). Enzymatic activity was tested by incubating each enzyme with its putative donor substrates (nucleotide-activated monosaccharides or alditols, e.g., UDP-GlcNAc [UDP N-acetylglucosamine] or CDP-glycerol) in reaction buffer (29) supplemented with magnesium chloride. After 3 h, the reactions were analyzed using a high-performance liquid chromatography-anion exchange chromatography (HPLC-AEC) assay. In all reactions, the consumption of the donor substrates and the simultaneous formation of the nucleotide products (UMP, CMP, UDP) could be observed in the 280-nm channel (compare control and reaction data in Fig. 2). Simultaneously, the formation of polymer in Cps12B, Cps1B, and Ccs2 reactions was detected in the 214-nm channel, whereas polymers generated by Cps3D and Cps7D, lacking the UV-active N-acetyl group, were visualized by alcian blue/silver-stained polyacrylamide gel electrophoresis (PAGE) (Fig. 2). It is of note that all reactions took place in the absence of priming acceptor substrates (e.g., hydrolyzed polymer), demonstrating that all polymerases were able to initiate polymer synthesis de novo. To evaluate the relevance of Mg 2ϩ for the activity of the TagF-like polymerases, the reactions described above were repeated in the presence of the Mg 2ϩ -chelating agent ethylenediaminetetraacetate (EDTA). In agreement with the findings indicating that GT-A folded proteins are dependent on divalent cations for the coordination of the negatively charged donor substrates (11) whereas basic amino acids assume this function in GT-B folded proteins (12), polymerases with the TagF-like/GT-B architecture were active even in the presence of EDTA, while the activity of polymerases adopting the GT-A/TagF-like architecture strictly depended on the presence of Mg 2ϩ (Fig. S8).
With the aim of analyzing the structure of the in vitro-synthesized polymers by NMR spectroscopy, the enzymatic reactions presented in Fig. 2 were upscaled and preparative AEC was used for polymer purification.
NMR analysis confirms the predicted activity of Cps1B, Cps12B, and Ccs2. The identity between the polymer generated by Cps12B and the previously characterized polymer generated by CslB (29) could be readily demonstrated by 1 H NMR spectroscopy (Fig. 3a), confirming that Cps12B is the polymerase of A. pleuropneumoniae serotype 12. 1 H, 13 C heteronuclear single-quantum correlation (HSQC) experiments were performed to characterize the products generated by Cps1B and Ccs2 and demonstrated that the obtained 13 C chemical shift values ( Fig. 3b and c) ( Table 1) were in perfect agreement with 13 C spectra of the native, de-O-acetylated capsule polymer of A. pleuropneumoniae serotype 1 (35) and H. influenzae serotype c (34), respectively.
NMR analysis confirms the predicted activity of Cps3D and Cps7D. Similarly, previously reported 13 C and 1 H spectra of the capsule polymers harvested from bacterial cultures of A. pleuropneumoniae serotype 7 (46) and serotype 3 (45) as well as from N. meningitidis serogroup H (41), which expresses the same polymer backbone as A. pleuropneumoniae serotype 3 (see Fig. S2), exactly match the 1 H-13 C correlations obtained by 1 H, 13 C HSQC for the polymers generated by Cps7D (Fig. 3d) and Cps3D (Fig. 3e), proving that both polymerases possessed their predicted activity.
Nevertheless, an additional set of signals slightly deviating from the published reference spectra was detected for both C1 and C1= in the Cps7D product (Fig. 3d, red arrows), and, in addition to the major anomeric signal (76%), two additional minor anomeric signals (14% and 10%) could be detected in the spectrum of the Cps3D product ( Fig. 3e, black, red, and magenta arrows, respectively).
We hypothesized that the additional signals resulted from the fact that the commercially available CDP-glycerol used in the Cps7D and Cps3D reactions was a racemic mixture containing both sn-glycerol-1-phosphate (C2= of glycerol has S chirality) and sn-glycerol-3-phosphate (C2= has R chirality; for nomenclature, see reference 40).  13 C HSQC of the polymer generated by Cps3D. The signal-to-noise ratio was adjusted to show only the resonances belonging to the dominant spin system containing the anomeric signal with the highest (76%) intensity (black arrow), which is in agreement with the previously published structure of the A. pleuropneumoniae serotype 3 capsule polymer. The two additional anomeric signals (14% and 10%) resulting from the incorporation of the nonnatural substrate sn-glycerol-1-phosphate are indicated by red and magenta arrows. Signals labeled with a crosshatch symbol (#) are due to process-related impurities which were absent in a second batch of Cps3D product (data not shown).

TagF-Like Capsule Polymerases
® Clearly, both enantiomers are substrates for the enzymes as shown by the complete consumption of CDP-glycerol in the HPLC assay ( Fig. 2e and f). This is noteworthy, since, in vivo, both CDP-glycerol and the A. pleuropneumoniae serotype 7 capsule polymer have been reported to be enantiopure, consisting exclusively of sn-glycerol-3phosphate (40,46) (the stereochemistry of the A. pleuropneumoniae serotype 3 capsule polymer has not been investigated yet).
Consequently, we performed a comprehensive two-dimensional (2D) NMR analysis of the glycosidic linkage generated by Cps7D and Cps3D to elucidate if the nonnatural enantiomer (sn-glycerol-1-phosphate) is incorporated into the polymer and yields the observed additional signals.
Stereochemistry of the polymer generated by Cps7D. As expected, the 2D total correlation spectroscopy (TOCSY) spectrum of the Cps7D polymer ( Fig. 4a) showed almost undistinguishable correlations for the two anomeric signals (5.04 ppm/ 101.5 ppm and 5.03 ppm/101.1 ppm) observed in the 1 H, 13 C HSQC (Fig. 3d, see C1 and red arrow), but their correlations in a 2D nuclear Overhauser effect spectroscopy (NOESY) spectrum were different (Fig. 4b). This confirmed our hypothesis that the galactose (Gal) spin systems were similar or identical, whereas the glycerol spin systems to which they are linked to were different.
In the 2D NOESY analysis (Fig. 4b), the spin system with black labels showed the NOE values to be 3.65 ppm (strong) and 3.82 ppm (weak), which were assigned to H1a= and H1b= of sn-glycerol-3-phosphate ( Fig. 4c, black structure), matching the values previously reported for the natural polymer (46). Their corresponding correlations were isolated in the 13 C-HSQC spectrum at 3.83 ppm/71.1 ppm and 3.65 ppm/71.1 ppm (Fig. 4d). The NOE cross-peak corresponding to 3.65 ppm was much more intense and could be assigned to the pro-S proton at C1=. The chirality at C2= was found to be R, based on the scalar couplings observed on the isolated pro-S and pro-R protons attached to C1= of glycerol. The coupling constant 3 J H1=(pro-R)H2= value was found to be 6.0 Hz, a value larger than the value of 3.5 Hz determined for 3 J H1=(pro-S)H2= (Fig. 4e), and fits with the conformation shown in Fig. 4c (black structure), in which H1a= (pro-R) is in trans to H2=. This agrees with Beynon et al., who determined for the R chirality at C2= that the 3 J H1=(pro-R)H2= value must be equal to or larger than the 3 J H1=(pro-S)H2= value, considering various rotamers (46,50).
The second anomeric signal of galactose, indicated with red labels, shows NOE correlations to 3.56 ppm (strong) and 3.87 ppm (weak) (Fig. 4b), which were assigned to H3a= and H3b= of sn-glycerol-1-phosphate, distinguishing itself from sn-glycerol-3phosphate solely by the chirality at C2= (Fig. 4c, red structure). By definition, the carbon numbering is inverted. Thus, their correlations in the 13 C-HSQC spectrum at 3.56 ppm/ 71.4 ppm and 3.88 ppm/71.4 ppm are in the same region as the H1a= and H1b= signals of the sn-glycerol-3-phosphate (Fig. 4d). The stronger NOE to 3.56 ppm defines the pro-S proton at C3=. The fact that the coupling constant 3 J H3=(pro-R)H2= ϭ 3.9 Hz is smaller than 3 J H3=(pro-S)H2= ϭ 7.5 Hz (Fig. 4e) fits with the conformation with S chirality at C2=  13 C chemical shifts of the polymers generated by Cps1B, Ccs2, Cps7D, and Cps3D, derived from the 1 H, 13 C HSQC experiment represented in Fig. 3 a Product 13 C chemical shifts (ppm)

Glucosamine
Galactose   The 2D NOESY correlations indicate differences between the two spin systems regarding the linkage to glycerol. (c) Chemical structures of the Cps7D products containing sn-glycerol-3-phosphate (black structure; C2= has R chirality) and sn-glycerol-1-phosphate (red structure; C2= has S chirality). The black structure is identical to the backbone of the capsule polymer expressed by A. pleuropneumoniae serotype 7. (d) 1 H, 13 C HSQC. Correlations corresponding to the CH and CH 2 groups are shown in blue and in green, respectively. (e) The stereochemistry of the glycerol was determined from an 1 H NMR spectrum according to a method described by Beynon et al., who determined for the R chirality at C2= that 3 J H1=(pro-R)H2= must be as large as or larger than 3 J H1=(pro-S)H2= , whereas for the S chirality at C2=, 3 J H3=(pro-R)H2= must be as small as or smaller than 3 Fig. 4c (red structure), in which the H1b= (pro-S) is in trans to H2=, again agreeing with Beynon et al., who determined that 3 J H3=(pro-R)H2= must be equal to or smaller than 3 J H3=(pro-S)H2= (46,50).
The two spin systems of the glycerol are separated in the 2D TOCSY spectrum (Fig. 4f), which shows that their methylene group (H1= for R chirality and H3= of S chirality) signals are distinguishable, whereas their resonances with respect to H2= and the other CH 2 group with the phosphate attached are almost identical. The two signals of phosphate-attached CH 2 of glycerol at~4.04 ppm/69.2 ppm and~3.98/69.2 ppm in the 13 C-HSQC spectrum contain overlapping signals of H3a= and H3b= for the R chirality (black label) and H1a= and H1b= for the S chirality (red label). The 2D TOCSY data reveal that H3a= and H3b= of the R form correspond to chemical shifts of 4.04 and 4.00 ppm, whereas H1a= and H1b= of the S form correspond to chemical shifts of 4.03 and 3.98 ppm. Both of these methylene groups show correlations to the phosphate in the 1 H-31 P heteronuclear multiple bond correlation (HMBC) spectrum (Fig. 4g), confirming the structures shown in Fig. 4c. In summary, the detailed NMR analysis of the Cps7D product confirmed that the observed second set of signals exclusively resulted from the incorporation of the nonnatural substrate sn-glycerol-1-phosphate.
Stereochemistry of the polymer generated by Cps3D. For the Cps3D product, our first aim was to clearly confirm the assignment shown for the dominant spin system in Fig. 3e. The strong anomeric signal at 5.23 ppm belonging to this spin system (black arrow in Fig. 3e, black structure in Fig. 5a) shows correlations in a 2D TOCSY spectrum to five other protons (H2 and H3, H4 and H5, and H6 of galactose), among which one was significantly shifted downfield (Fig. 5b, black labels). Establishing the H1 and H2, H2 and H3, and H3 and H4 correlations in a 2D COSY spectrum (data not shown) showed that the downfield-shifted resonance can be assigned to H4, which showed a correlation to 31 P in a 1 H-31 P HMBC spectrum, indicating a linkage between phosphate and C4 (Fig. 5c). A strong NOE between the anomeric proton and H2= confirmed the linkage to C2= of glycerol (Fig. 5d) and thus the identity to the backbone of the capsule polymer expressed by A. pleuropneumoniae serotype 3 (45) (Fig. 5a, black structure). The chirality at the glycerol carbon C2=, not determined previously (45), is derived in the following sections.
At lower contour levels, signals of two other galactose spin systems became visible (Fig. 5e, marked with red and magenta labels). The TOCSY correlations are practically identical to those seen with the dominant spin system (Fig. 5b), including the downfield-shifted H4 resonance, suggesting that they all belong to a galactose with a phosphate substitution at C4.
The difference between the black and the magenta spin systems becomes apparent by visualizing the glycosidic linkage as shown in Fig. 5a. Assuming that both repeating units adopt the most likely populated conformation, in which the exo-anomeric effect governs the phi-psi angles of the glycosidic linkage, the C3= points toward the reader. The strong NOE between galactose H1 and H2= observed for both spin systems (Fig. 5a and d) suggests that H2= is orientated upwards, coming close to galactose C1 and O5. The chirality at C2= is determined by the phosphate; if the phosphate is attached to C3= in front, C2= has R chirality (sn-glycerol-3-phosphate, black structure), and if the phosphate is attached to C1=, C2= has S chirality (sn-glycerol-1-phosphate, magenta structure). The glycerol CH 2 group with the shortest distance to galactose H1 is the one in front at C3=. Consistent with these considerations, the strongest NOEs with respect to 4.13 and 4.06 ppm in the black spin system resulted from H3a= and H3b=, which also showed correlations to 31 P (Fig. 5c), confirming that the black spin system had R chirality at C2= and contained sn-glycerol-3-phosphate. This is further supported by the effect of the phosphate on the 13 C and 1 H chemical shifts; C3= was shifted farther  downfield than C1= (Fig. 5e), and both H3a= and H3b= appeared to have been shifted farther downfield than H1= (Fig. 5d). In the case of the magenta spin system, the strongest NOEs were seen in the upfield resonances of 3.82 and 3.77 ppm, which in turn did not show correlations to 31 P, indicating S chirality. Only a weak NOE was seen for the H1= signal, which was shifted downfield and showed correlations to 31 P and thus is linked to phosphate, confirming a C2= with S chirality (sn-glycerol-1-phosphate).
While the black-labeled and magenta-labeled spin systems showed a very strong NOE at 3.98/3.99 ppm (corresponding to H2= of the glycerol and thus indicating a linkage to C2=), the red spin system showed a strong NOE at 3.58 ppm (Fig. 5a and d), which originated from a CH 2 group (Fig. 5e). Interestingly, the resonances of the red spin system showed striking similarity to those seen with the sn-glycerol-1-phosphatecontaining repeating units generated by Cps7D (Fig. 5 and 4, red labels). Thus, the resonance at 3.58 ppm was assigned to the pro-S proton at C3= of sn-glycerol-1phosphate. This assignment suggests that the inverted stereochemistry of the nonnatural substrate sn-glycerol-1-phosphate forces Cps3D to misplace the glycosidic linkage and connect C1 of galactose to C1= of glycerol instead to C2=. However, as shown by 2D TOCSY (Fig. 5b), the phosphodiester bond is still correctly placed at C4 of the galactose rather than at C3 as reported for the product of Cps7D (compare Fig. 5a and 4a, red structures). In summary, the NMR study data presented above clearly demonstrate that the observed heterogeneity in the Cps3D product exclusively originated from incorporation of the nonnatural substrate sn-glycerol-1-phosphate.
Role of the tetratricopeptide domain. Many TagF-like polymerases were predicted to contain a domain that can be modeled onto templates rich in tetratricopeptide repeats (TPRs) (Fig. 1c and d; see also Fig. S3). Since TPRs are known for mediating protein-protein interactions (59), we hypothesized that the predicted TPR domains do not participate in the catalytic activity of the polymerases. To confirm this hypothesis, we truncated the TPR domain in Cps1B and Cps7D as representative candidates for the GT-A/TagF-like and the TagF-like/GT-B architectures, respectively. Unfortunately, the resulting Cps7D construct showed low expression levels and could not be purified (data not shown). Since Cps1B, like many group 2 capsule polymerases (23,29), was shown to be expressed as an N-terminal truncation lacking the first 31 amino acids (Fig. S7), the TPR truncation was introduced in Cps1B 32-1246 , yielding Cps1B 32-858 . Cps1B 32-858 could be expressed and purified, maintained its activity in the HPLC-AEC assay (data not shown), and the 1 H NMR spectra obtained for the polymer synthesized by Cps1B  and Cps1B 32-858 were identical ( Fig. 6a and b). However, it is of note that in analytic size exclusion chromatography, Cps1B 32-1246 eluted with an apparent molecular mass corresponding to a dimeric to trimeric assembly, whereas Cps1B 32-858 -His 6 appeared to be monomeric, indicating that the TPR domain mediates oligomerization of Cps1B (Fig. 6c).
Mutational studies of TagF-like polymerases. The correctness of the PHYRE2 modeling was corroborated by the fact that amino acids known to have catalytic functions in TagF (H444, 584) (54), K4CP (D519xD521 motif) (57), and TarM (R326, K331) (58) (all residues involved in coordinating the pyrophosphate of the donor substrate) could be superimposed with identical residues in the predicted TagF-like, GT-A folded, and GT-B folded domains, respectively (Fig. 1e), even though the percentage of sequence identity observed between the predicted domains and their templates was low ( Fig. S4 and S6). Moreover, multiple-sequence alignments performed using Clustal Omega (56) demonstrated these residues to be part of larger sequence motifs that appeared to be conserved in both the predicted domains and their corresponding templates (Fig. S5, S9, and S10). Focusing on Cps1B (expressed as Cps1B 32-858 -His 6 ) and Cps7D (expressed as MBP-Cps7D-His 6 ), all conserved positions were mutated to alanine to give the single-domain mutants Cps1B(D133A/D135A), Cps1B(H587A), and Cps1B(H717A) (Fig. 7a), as well as Cps7D(H612A), Cps7D(H743A), Cps7D(R1123A), and Cps7D(K1132A) (Fig. 7b). Mutant constructs were expressed in E. coli and subsequently purified at levels comparable to the wild-type levels. As expected, when activity was assayed using the HPLC-AEC assay (Fig. 7c to f) and PAGE (Fig. 7g and h), no polymer synthesis could be observed after 3 h of incubation ( Fig. 7g and h) and the level of UMP/CMP detected in the reactions was comparable to those seen with the negative controls and could thus be attributed to spontaneous hydrolysis of the donor substrates ( Fig. 7e and f, 280-nm channel). Interestingly, considerable levels of UDP, most likely due to enzyme-facilitated hydrolysis of UDP-galactose (UDP-Gal), could be detected in the presence of TagF-like domain mutants Cps7D(H612A) and Cps7D(H743A) (Fig. 7f, constructs 2 and 3), indicating that the GT-B folded domain was unaffected by the mutation in the TagF-like domain and corroborating its GT activity (hexose-1phosphate transferase activity would release UMP). Assuming that each single-domain mutant should still contain one domain that remained functional, we performed trans-complementation reactions, combining the GT-A domain mutant and the TagFlike domain mutants of Cps1B as well as the TagF-like domain mutants and the GT-B domain mutants of Cps7D. Indeed, donor substrate consumption ( Fig. 7e and f) and polymer synthesis (Fig. 7g and h) were restored to wild-type levels in all transcomplementation reactions, indicating that the two remaining unmodified domains were able to catalyze the reaction in trans.

DISCUSSION
This report presents the identification of a novel family of bacterial multidomain enzymes termed ЉTagF-like capsule polymerasesЉ that generate complex phosphatecontaining group 2 capsule polymers. PHYRE2 (53) homology modeling, complemented by experimental data, suggests that the TagF-like domain, conserved in all TagF-like polymerases, catalyzes the transfer of hexose-phosphate and glycerolphosphate residues, whereas adjacent GT-A and GT-B folded domains transferred hexose residues with inverted stereochemistry (␤-glycosidic linkages) or retained stereochemistry (␣-glycosidic linkages), respectively. Mutations in the catalytic core, selected based on the template structures of TagF (54), K4CP (GT-A) (57) and TarM (GT-B) (58), abolished activity, strongly indicating the correctness of the modeling results. Litschko et al.

®
Importantly, trans-complementation restored activity to wild-type levels, corroborating that each domain is independent from the other with regard to activity and folding. Using size exclusion chromatography and 1 H NMR, we showed that the TPR (tetratricopeptide repeat) domain, predicted at the N terminus or C terminus of the majority of TagF-like polymerases, mediates oligomerization in Cps1B but is not required for enzymatic activity. Unfortunately, attempts to truncate the N-terminal TPR domain in Cps7D resulted in abolishment of expression, suggesting that the TPR domain may play a more important role with regard to protein stability and/or folding in polymerases with TagF-like/GT-B architecture. Given the considerable size (~400 amino acids) of the predicted TPR domains, it is likely that they adopt other functions in the in vivo context, such as mediating interactions with the ABC transporter of the capsule biosynthesis complex, with polymer-modifying enzymes, or with sugar/ glycerol-activating enzymes (1).
Bacteria expressing TagF-like polymerases were largely identified through analysis of published capsule structures (only Cps12B, Cps4B, and Ccs2 could be identified by BLAST searches performed using the CslB sequence as the query). Those bacterial species are as diverse as the meningitis-causing agent N. meningitidis and uropathogenic E. coli K2 (both human pathogens) and the animal respiratory pathogens A. pleuropneumoniae and B. trehalosi. Consequently, the polymer structures generated by TagF-like polymerases as well as the sequence identity between the enzymes can vary considerably. The neisserial polymerases CslB and Cshc, which generate a trimeric poly(oligosaccharide phosphate) and a dimeric poly(glycosylpolyol phosphate) repeating unit (29,41), respectively, share only 14% sequence identity, whereas enzymes generating identical structures can share up to 70% sequence identity. Consistent with the fact that the TagF-like domain is most ambiguous (it is able to transfer both glycerol-phosphate and hexose-phosphate residues), the percentages of sequence identity between TagF-like domains from different strains represent a large range of diversity (from 15% to 93%), whereas the GT-A and GT-B folded domains share 30% to 70% and 40 to 90% identity, respectively. To retrospectively assess the distribution of TagF-like polymerases, BLAST searches were performed using the amino acid sequences of newly identified Cps1B and Cps7D, yielding homologues with sequence identities ranging from 35% to 65% in a variety of pathogens such as Actinobacillus suis, Campylobacter jejuni, Campylobacter coli, Mannheimia varigena, Neisseria mucosa, and Yersinia enterocolitica. This finding suggests that TagF-like polymerases also play a role in a multitude of bacterial species whose capsule structures have not been investigated yet.
Capsule polymers are utilized to produce efficient glycoconjugate vaccines, in which polymer fragments are covalently coupled to carrier proteins to elicit T-cell responses and immunologic memory (60). Immunizations with glycoconjugates containing capsule polymers from, e.g., H. influenzae, S. pneumoniae, and N. meningitidis, have been proven highly successful in preventing infectious diseases in humans (61), and experimental glycoconjugate vaccines against animal pathogens, e.g., A. pleuropneumoniae serotype 1, have been documented to produce strong immune responses (62,63). Unfortunately, the biohazards and costs associated with glycoconjugate production are key barriers for broad market launch (64). Current protocols require isolation of capsule polymers from large-scale fermentation of pathogenic bacteria. This step essentially depends on the high-tech infrastructure of modern production plants, which require the highest biosafety standards (61), making glycoconjugate vaccines expensive; not sufficiently accessible to low-income countries; and, with regard to animal husbandry, where they could reduce/avoid the exuberant use of antibiotics, not sufficiently costeffective (65,66). We recently showed that enzymatic synthesis of the capsule polymer from N. meningitidis serogroup X provides a simple, biohazard-free, and thus potentially more cost-effective means for the production of a functional glycoconjugate vaccine (25,27,67). The identification of the TagF-like polymerase family vastly increases the number of enzymes and thus the number of capsule polymers available for the development of glycoconjugate vaccines against other human and even animal pathogens.
The poly(glycosylpolyol phosphate) structures generated by the TagF-like/GT-B folded polymerases have in the past been described as teichoic acid-like polymers due to their similarity to WTA type II (48,50), a common polymer of the cell envelope of Gram-positive bacteria. Unfortunately, little is known about WTA type II biosynthesis and further studies are needed to investigate if homologues of TagF-like polymerases play a role (39,40). Nonetheless, TagF (54) and TarM (58), the template structures for modeling the TagF-like domain and the GT-B domain, polymerize and modify wall teichoic acid type I, respectively. Despite the fact that TarM and TagF are expressed in Gram-positive bacteria and act as separate polypeptides, they share conserved amino acids and, in case of TagF, even all active-site motifs (54) with the domains of the TagF-like polymerases. It is thus tempting to speculate that they were acquired by Gram-negative group 2 capsule-expressing bacteria through horizontal gene transfer or vice versa. Further research is needed to elucidate the evolutionary relationship between WTA type I and group 2 capsule biosynthesis.
The NMR characterization of the Cps7D and Cps3D products clearly demonstrated that both enzymes consumed sn-glycerol-3-phosphate and sn-glycerol-1-phosphate from the racemic CDP-glycerol mixture, whereas A. pleuropneumoniae serotype 3 and 7 capsules exclusively contain sn-glycerol-3-phosphate (45,46). sn-Glycerol-1-phosphate is commonly found in lipoteichoic acid and is supplied from the donor substrate phosphatidylglycerol on the outer side of the Gram-positive cell membrane, whereas the sn-glycerol-3-phosphate found in WTA is usually provided by CDP-glycerol in the cytosol (40). In agreement with that, group 2 capsule biosynthesis takes place on the cytoplasmic side of the inner membrane and a glycerol-3-phosphate cytidylyltransferase has been predicted in region 2 of the A. pleuropneumoniae serotype 3 and 7 capsule gene cluster (10). It is tempting to speculate that the lack of selective pressure for sn-glycerol-3-phosphate resulting from the absence of sn-glycerol-1-phosphate in the cytosol is the reason for the observed substrate ambiguity of Cps7D and Cps3D.
In contrast to all other polymerases analyzed in this study, the candidates from B. trehalosi (B. trehalosi 188 [Bt188], Bt189, Bt192, and BtY31 [68,69]) (see Fig. S2 and Fig. S3 in the supplemental material) were identified in genomes of nonserotyped strains. The capsules of three of the four known B. trehalosi serotypes have been analyzed by NMR in the past, and all of those studies reported a poly(glycoslypolyol phosphate) structure (50)(51)(52). In agreement with these findings, the polymerases encoded in the Bt188, Bt189, and Bt192 genomes adopt the TagF-like/GT-B architecture. Interestingly, homology modeling for the polymerase encoded in the BtY31 genome predicts a GT-A/TagF-like fold, suggesting that BtY31 expresses a poly(oligosaccharide phosphate) capsule. Although a structural analysis of the BtY31 capsule polymer will be required for confirmation, this finding already highlights that the knowledge presented here about the TagF-like polymerase family can be exploited to predict the composition of unknown capsule structures, providing a starting point for structural analyses.
In summary, the identification of the TagF-like capsule polymerase family (i) reveals a new class of polymerases known to be involved in group 2 capsule biosynthesis, (ii) represents a new source for synthetic biomaterials, (iii) allows conclusions to be drawn with respect to the relationship between the protein repertoire and capsule structure, and (iv) potentially provides a link between cell wall biosynthesis in Gram-positive and that in Gram-negative bacteria.

MATERIALS AND METHODS
Bioinformatics. Homologues of CslB were identified by protein BLAST searches using the CslB amino acid sequence as the query and the BLASTP algorithm (32), homology modeling was performed using the intensive mode of the structure prediction tool PHYRE2 (53), and sequence alignments were performed using Clustal Omega (56).
Expression and purification of recombinant proteins. Expression and purification of recombinant constructs were performed as previously described (29). Briefly, E. coli M15[pREP4] cells were transformed with the plasmids shown in Table 3  Enzymatic reactions and analysis via HPLC and polyacrylamide gel electrophoresis (PAGE). Enzymatic reactions were carried out with 0.1 to 0.3 nmol of purified protein in a total volume of 75 l of assay buffer (20 mM Tris [pH 8.0], 1 mM DTT, 10 mM MgCl 2 , 6 to 10 mM donor sugar). UDP-GlcNAc (Carbosynth), UDP-Gal (Carbosynth), and CDP-glycerol (racemic; from Sigma-Aldrich) were used as donor substrates as indicated in Fig. 2. Reaction mixtures containing wild-type proteins were incubated for 24 h at 37°C, while mutant constructs were incubated for 3 h at 37°C to minimize spontaneous hydrolysis of donor substrates. HPLC-based anion exchange chromatography (HPLC-AEC) was performed on a Prominence UFLC-XR system (Shimadzu) equipped with a CarboPac PA-100 column (Dionex) (2 by 250 mm). Samples were separated as described in reference 29, with the adjustment that 20 mM Tris (pH 8.0) and 20 mM Tris (pH 8.0) plus 1 M NaCl were used as mobile phases M1 and M2, respectively. Six microliters of the samples was loaded for the detection of nucleotides at 280 nm and 50 l for the detection of capsule polymer at 214 nm. Nucleotides were separated using a linear elution gradient of 0% to 30% M2 over 11 min. Polymers were separated using an elution gradient consisting of a Ϫ2 curved gradient of 0% to 30% M2 over 4 min followed by a linear gradient of 30% to 84% M2 over 33 min. Chromatography TagF-Like Capsule Polymerases ® was conducted at 0.6 ml/min with a column temperature of 50°C. Four-microliter volumes of Cps3D and Cps7D reaction samples mixed with 4 l of a 2 M sucrose solution were used for separation on high-percentage (15%) PAGE and visualized by a combined alcian blue/silver staining procedure as described in reference 70. SDS polyacrylamide gel electrophoresis (SDS-PAGE) was performed as described in reference 29.
Upscaling of in vitro polymer synthesis and subsequent purification. For in vitro synthesis of 5 to 12 mg of polymer, protein (1 to 25 nmol) was incubated overnight at 37°C in reaction buffer (20 mM Tris [pH 8.0], 10 mM MgCl 2 , 1 mM DTT) with a 6 to 10 mM concentration of the activated substrates in a total volume of 5 to 10 ml. In vitro-synthesized polymer was purified by anion exchange chromatography (AEC) using a MonoQ HR10/100 Gl column (GE Healthcare) and a linear NaCl gradient (over 41 min) starting at 0 to 1 M NaCl at a flow rate of 1 ml/min. Polymer-containing fractions were pooled, dialyzed against water (ZelluTrans; Roth) (1,000 molecular weight cutoff [MWCO]), and freeze-dried for further analysis.
an appropriate number of scans. The spectra were generally weighted with 0.2-Hz line broadening and were subjected to Fourier transformation. The transmitter was set at the water frequency which was used as the reference signal (4.79 ppm). All the 1 H spectra were obtained in a quantitative manner using the total recycle time to ensure a full recovery of each signal (5 ϫ longitudinal relaxation time T 1 ). 13 C NMR spectra were recorded at 100.6 MHz and 298 K, with 32,000 data points over a 200-ppm spectral width, accumulating an appropriate number of scans. The spectra were generally weighted with 1.0-Hz line broadening and were subjected to Fourier transformation. The transmitter was set at the acetone frequency, as an external calibration, which was used as the reference signal (30.89 ppm). Data from two-dimensional 1 H, 13C HSQC experiments were acquired with a standard pulse program. Totals of 4,096 and 512 data points were collected in the F2 and F1 dimensions, respectively. An appropriate number of scans were accumulated prior to Fourier transformation to yield digital resolutions of 0.2 Hz and 1.0 Hz per point in F2 and F1, respectively. NMR measurements shown in Fig. 3d and e, 4, and 5 were recorded on a Bruker Avance III HD 600 MHz spectrometer with a QXI room-temperature probe for 1 H/ 13 C/ 15 N/ 31 P (Bruker Biospin, Germany). Polymers were dissolved in 500 l and measured in standard 5-mm TA tubes (Armar, Germany) at 298 K. The temperature was calibrated with methanol-d 4 (Armar, Germany) (99.8% D), and spectra were calibrated in 1 H using the Bruker standard sample of 2 mM sucrose-0.5 mM DSS (4,4-dimethyl-4silapentane-1-sulfonic acid). 13 C frequencies were calibrated indirectly using the recommended scaling factor ⌶ of 0.25144953. Indirect referencing of 31P chemical shifts was performed using the chemical shift ratio 0.404808636 as advised by the Biological Magnetic Resonance Data Bank. 1 H NMR spectra were recorded with 64,000 data points and a spectral width of 20 ppm, typically using 64 transients. 31 P NMR spectra were recorded with 16,000 data points and a spectral width of 50 ppm, typically using 64 transients. 2D 1 H-13 C HSQC spectra were measured using the Bruker pulse sequence hsqcedetgpsisp2.2 with 2,048-and 16-ppm spectral widths for 1 H and 230 data points and a 60-ppm spectral width using 32 scans and a recycle delay of 1.5 s, resulting in a measurement time of 3.5 h. 2D 1 H-1 H TOCSY spectra were collected with 2,-048-and 13.9-ppm spectral widths for 1 H and 512 data points and a 13.9-ppm spectral width using 4 scans, a mixing time of 80 ms, and a recycle delay of 2 s, resulting in a measurement time of 1.5 h. 2D 1 H-1 H COSY spectra were collected using cosygpppqf with 2,048-and 10-ppm spectral widths for 1 H and 128 data points and a 10-ppm spectral width using 32 scans and a recycle delay of 2 s, resulting in a measurement time of 2 h 40 min. 2D 1 H-31 P HMBC spectra were recorded using the pulse sequence hmbclpndqf with 4,096-and 10-ppm spectral widths for 1 H and 64 data points and a 30.5-ppm spectral width using 32 scans and a recycle delay of 1.5 s, resulting in a measurement time of 1 h 12 min. 2D 1 H-1 H NOESY spectra were collected with 2,048-and 10-ppm spectral widths for 1 H and 700 data points and a 10-ppm spectral width using 32 scans, a mixing time of 120 ms, and a recycle delay of 1 s, resulting in a measurement time of 8 h 20 min. 2D 1 H-13 C HMBC spectra were recorded using the pulse sequence hmbclpndqf with 4,096-and 20-ppm spectral widths for 1 H and 512 data points and a 222-ppm spectral width using 64 scans, optimized for a J CH long-range coupling of 8 Hz and a recycle delay of 2 s, resulting in a measurement time of 20.5 h.
Bruker TopSpin versions 3.5pl6 and 3.2 were used to process NMR data. Topspin and Sparky (T. D. Goddard and D. G. Kneller, SPARKY 3; University of California, San Francisco) were used to analyze and assign NMR data.
Accession number(s). Accession numbers KY798410 and KY807157 (see Table 2) have been submitted to GenBank. All other accession numbers cited in the manuscript are already accessible.

ACKNOWLEDGMENTS
We thank Heike Claus and Falk Büttner for providing genomic DNA of H. influenzae serotype c and A. pleuropneumoniae serotype 7, respectively, and Jana Führing for critically reading the manuscript.
This study received financial support from MHH internal impact orientated funds (LOM) to the Institute of Clinical Biochemistry, and a HiLF (Hochschulinterne Leistungsförderung) starter grant awarded to T.F.