Sequence Variation of Rare Outer Membrane Protein β-Barrel Domains in Clinical Strains Provides Insights into the Evolution of Treponema pallidum subsp. pallidum, the Syphilis Spirochete

ABSTRACT In recent years, considerable progress has been made in topologically and functionally characterizing integral outer membrane proteins (OMPs) of Treponema pallidum subspecies pallidum, the syphilis spirochete, and identifying its surface-exposed β-barrel domains. Extracellular loops in OMPs of Gram-negative bacteria are known to be highly variable. We examined the sequence diversity of β-barrel-encoding regions of tprC, tprD, and bamA in 31 specimens from Cali, Colombia; San Francisco, California; and the Czech Republic and compared them to allelic variants in the 41 reference genomes in the NCBI database. To establish a phylogenetic framework, we used T. pallidum 0548 (tp0548) genotyping and tp0558 sequences to assign strains to the Nichols or SS14 clades. We found that (i) β-barrels in clinical strains could be grouped according to allelic variants in T. pallidum subsp. pallidum reference genomes; (ii) for all three OMP loci, clinical strains within the Nichols or SS14 clades often harbored β-barrel variants that differed from the Nichols and SS14 reference strains; and (iii) OMP variable regions often reside in predicted extracellular loops containing B-cell epitopes. On the basis of structural models, nonconservative amino acid substitutions in predicted transmembrane β-strands of T. pallidum repeat C (TprC) and TprD2 could give rise to functional differences in their porin channels. OMP profiles of some clinical strains were mosaics of different reference strains and did not correlate with results from enhanced molecular typing. Our observations suggest that human host selection pressures drive T. pallidum subsp. pallidum OMP diversity and that genetic exchange contributes to the evolutionary biology of T. pallidum subsp. pallidum. They also set the stage for topology-based analysis of antibody responses to OMPs and help frame strategies for syphilis vaccine development.

IMPORTANCE Despite recent progress characterizing outer membrane proteins (OMPs) of Treponema pallidum, little is known about how their surface-exposed, ␤-barrel-forming domains vary among strains circulating within high-risk populations. In this study, sequences for the ␤-barrel-encoding regions of three OMP loci, tprC, tprD, and bamA, in T. pallidum subsp. pallidum isolates from a large number of patient specimens from geographically disparate sites were examined. Structural models predict that sequence variation within ␤-barrel domains occurs predominantly within predicted extracellular loops. Amino acid substitutions in predicted transmembrane strands that could potentially affect porin channel function were also noted. Our findings suggest that selection pressures exerted within human populations drive T. pallidum subsp. pallidum OMP diversity and that recombination at OMP loci contributes to the evolutionary biology of syphilis spirochetes. These results also set the stage for topology-based analysis of antibody responses that promote clearance of T. pallidum subsp. pallidum and frame strategies for vaccine development based upon conserved OMP extracellular loops.
KEYWORDS Treponema pallidum, molecular subtyping, outer membrane proteins, spirochetes, syphilis A fter years of steady decline during the 1990s, syphilis, a sexually transmitted infection caused by the uncultivatable spirochete Treponema pallidum subsp. pallidum, has undergone a dramatic resurgence in the United States, particularly among men who have sex with men (1). Syphilis also poses a major threat globally, with an estimated 5.6 million new cases annually and 350,000 adverse pregnancy outcomes due to mother-to-child transmission (2). The failure of epidemiological approaches to curtail the spread of syphilis underscores the need for a vaccine capable of inducing protective antibody responses to geographically widespread and genetically diverse T. pallidum subsp. pallidum strains (3,4). T. pallidum subsp. pallidum has been designated "the stealth pathogen" based on its ability to evade innate and adaptive immune responses for protracted periods, permitting repeated bouts of hematogenous dissemination and invasion of numerous organs, including the central nervous system and the fetal-placental barrier during pregnancy (5)(6)(7). While the appearance of opsonic antibodies is widely regarded as a turning point in the battle between host and pathogen (8,9), the targets of antibodies that promote bacterial clearance during syphilitic infection are largely unidentified. How immune pressures in high-risk populations influence the epidemiology of syphilis and the evolutionary biology of T. pallidum subsp. pallidum is also poorly understood.
Dual-membrane bacteria have evolved a unique class of integral outer membrane protein (OMP) in which antiparallel, amphipathic ␤-strands circularize to form a closed barrel structure, often creating a central aqueous channel that permits uptake of nutrients and efflux of waste products (10)(11)(12). Extracellular loops bridge adjacent transmembrane strands, extending from the OM into the external milieu (13). Protective B-cell determinants reside in the extracellular loops and undergo sequence/ antigenic variation to circumvent herd immunity to previously circulating strains (14)(15)(16)(17)(18). Extracellular loops also play critical roles in disease pathogenesis by promoting interactions with host cells and tissue components and protecting the bacterium against innate clearance mechanisms, such as complement-mediated lysis and neutrophil engulfment (19)(20)(21)(22)(23)(24)(25).
Multiple factors have impeded efforts to identify the syphilis spirochete's integral OMPs. These include the recalcitrance of T. pallidum subsp. pallidum to in vitro cultivation (26,27), the fragility of its outer membrane (28,29), its relatively low abundance of outer membrane-spanning proteins (30,31), and the lack of strong sequence relatedness between T. pallidum subsp. pallidum OMPs and well-characterized proteins in Gram-negative OMs (32,33). To circumvent these obstacles, computational methods were employed to mine the T. pallidum subsp. pallidum Nichols strain genome for proteins predicted to form OM-associated ␤-barrels. This bioinformatics approach, combined with a battery of biophysical and cellular localization techniques, including opsonophagocytosis assays, yielded a panel of candidate OMPs (33)(34)(35)(36)(37). One of these, TP0326/BamA, is the central component of the molecular machine that chaperones newly exported precursor OMPs from the periplasm into the OM (37,38). A homology model based on the solved structure of the Neisseria gonorrhoeae ortholog (38) predicts that the ␤-barrel of T. pallidum subsp. pallidum BamA contains 16 transmembrane ␤-strands and eight extracellular loops (37) (Fig. 1). One extracellular loop, L4, was previously shown by the use of human syphilitic serum samples to contain an immunodominant epitope; antibodies against this loop also promoted opsonization of T. pallidum subsp. pallidum by rabbit peritoneal macrophages (37). A second group of candidate OMPs, T. pallidum repeats C and D (TprC/D) (TP0117/TP0131) and TprI (TP0620), consists of members of the paralogous Tpr family (39,40). Analysis of recombinant TprC/D and TprI suggests that their ␤-barrel domains ( Fig. 1) form aqueous channels in liposomes (35,36), which is consistent with their potential functions as porins. As with porins from Gram-negative bacteria (11), TprC/D and TprI form trimers, with the ␤-barrel domains being essential for trimerization (35,36). A similar bipartite topology with a periplasmic N-terminal major outer sheath protein (MOSP N ) and OM-embedded C-terminal MOSP (MOSP C ) trimeric ␤-barrel ( Fig. 1) also has been demonstrated for the MOSP of the oral commensal T. denticola, the parental ortholog for the Tpr family (41,42). Surface epitope mapping of subfamily I members based on accurate structural models has yet to be performed.
Previously, Centurion-Lara and coworkers (40) performed a detailed analysis of tpr genes from four subspecies of pathogenic treponemes, including a small number of T. pallidum subsp. pallidum isolates. Their predictions regarding sequence variability and immune pressure, however, were based on structural models for TprC/D and TprI that used the full-length polypeptides rather than just the OM-embedded ␤-barrel-forming  (35,36). MOSP N and MOSP C correspond to conserved domains shared with the N and C termini of the major outer sheath protein (MOSP) of T. denticola, the parental Tpr ortholog, identified by the NCBI conserved domain database (CDD) server. Arrows indicate the regions that were subjected to PCR amplification for sequencing (see Table S3 for primers). The designation CVR (central variable region) denotes a sequence-variable stretch present in all Tpr orthologs (40,77). BamA consists of a C-terminal ␤-barrel and five periplasmic polypeptide transport-associated (POTRA) domains (34,37). Numbers refer to amino acid positions within the full-length proteins (signal peptides; denoted by "S," included) from T. pallidum subsp. pallidum Nichols. (B) Membrane topologies. In Tpr orthologs and MOSP, the MOSP C domain forms the surface-exposed ␤-barrel (35,36,42). Immunofluorescence experiments in T. pallidum have confirmed the periplasmic location of MOSP N and the CVR of TprC/D (Nichols) and TprI (35,36). Moreover, the periplasmic portions of TprC/D and TprI form extended structures, as determined by small-angle X-ray scattering analysis, that anchor the ␤-barrels to the peptidoglycan sacculus (35,36). A homology model based on the solved structure of the Neisseria gonorrhoeae ortholog (38) predicts that the ␤-barrel of T. pallidum subsp. pallidum BamA contains 16 transmembrane ␤-strands and 8 extracellular loops (37). OM, outer membrane; PG, peptidoglycan; CM, cytoplasmic membrane; L4, BamA immunodominant extracellular loop 4.
OMP ␤-Barrel Diversity in T. pallidum Clinical Strains ® MOSP C domains. Thus, no study to date has looked at sequence variation within the regions of T. pallidum subsp. pallidum OMPs known to reside at the host-pathogen interface (i.e., surface-exposed regions) in multiple T. pallidum subsp. pallidum strains circulating within at-risk populations. Here, we examined the ␤-barrel-encoding domains of the tprC, tprD, and bamA genes in DNA extracted from T. pallidum subsp. pallidum within 31 clinical specimens obtained from early syphilis patients in Cali, Colombia (43,44); San Francisco, CA (SF) (45); and the Czech Republic (CZ) (46,47). The resulting sequences were compared to the corresponding loci within the 41 T. pallidum subsp. pallidum reference genomes available from NCBI databases. On the basis of structural models for TprC/D and BamA, much of the sequence variability within all three OMPs is predicted to lie within extracellular loops containing B-cell epitopes. The models also identified amino acid substitutions in predicted transmembrane ␤-strands for TprC/D and TprD2, which could affect the selectivity of their porin channels. Lastly, OMP profiles of clinical strains at all three loci appear to be mosaics of alleles represented in the T. pallidum subsp. pallidum reference genomes and did not correlate with results from enhanced molecular typing. The findings presented here are consistent with the notion that selection pressures within human populations drive T. pallidum subsp. pallidum OMP diversity and that genetic exchange within and between the Nichols and SS14 clades contributes to the evolutionary biology of syphilis spirochetes. These results also set the stage for topology-based analyses of antibody responses that promote clearance of individual T. pallidum subsp. pallidum strains and, importantly, could be used to develop a broadly protective vaccine based on conserved extracellular loops.
(Portions of this work were presented at the STI & HIV World Congress, Rio de Janeiro, Brazil, 9 to 12 July 2017.)

RESULTS
Patients and clinical samples. Skin biopsy specimens from secondary syphilis rashes were obtained from patients seen in Cali, Colombia; swabs from exudative lesions were obtained from patients with early syphilis in San Francisco, CA (SF), and in Brno and Prague, Czech Republic (CZ). Specimens from Cali and SF were chosen for amplification and sequencing of ␤-barrel regions based on treponemal burdens determined by polA quantitative PCR (qPCR). The CZ specimens selected were PCR positive for all typing loci tested (47), reflecting high T. pallidum subsp. pallidum burdens. Table S1 in the supplemental material contains a summary of the available demographic information for the samples used in this study.
The T. pallidum subsp. pallidum strains in the clinical samples belonged to the Nichols and SS14 clades. Prior multilocus and genomic sequence analyses demonstrated that syphilis spirochetes cluster into two taxonomic groups, or clades, arbitrarily named after the Nichols and SS14 reference strains (48)(49)(50). Those studies also established that reliable clade designations for T. pallidum subsp. pallidum could be assigned on the basis of an 83-bp region of T. pallidum 0548 (tp0548) (48-50), a locus used in epidemiological studies as part of the enhanced T. pallidum subsp. pallidum typing scheme (51, 52) (see Fig. S1A in the supplemental material). At the outset, T. pallidum subsp. pallidum sequences in clinical samples from all three study sites were amplified using tp0548-specific primers; partial sequences were obtained from 29 samples (Fig. S1B). Among the 14 samples from Cali, 7 contained tp0548 type f, placing these strains in the SS14 clade, while the remaining 7 contained an assortment of types belonging to the Nichols clade. The six SF strains were a mixture of types within the SS14 clade. Types for seven CZ strains also fell within the SS14 clade, while two contained T. pallidum subsp. pallidum that belonged to the Nichols clade.
Nucleotide polymorphisms within tp0558 (encoding a NiCoT family nickel-cobalt inner membrane permease [53]) also can be used for clade discrimination (48,54). On the basis of the tp0558 sequences, clade assignments for T. pallidum subsp. pallidum within 30 clinical samples were determined (Fig. 2). While the majority were exact matches for either clade at all five tp0558 "discriminator" nucleotide positions, Cali_84, Cali_123, and Cali_133 contained substitutions not found in any of the T. pallidum subsp. pallidum reference strains. Of the 27 strains for which we also obtained partial tp0548 sequences, all but one were concordant at both loci; Cali_133 was Nichols by tp0548, and SS14 by tp0558.
In summary, based upon tp0548 and/or tp0558, of the 31 clinical samples, T. pallidum subsp. pallidum strains within 8 clinical samples belonged to the Nichols clade, 21 belonged to the SS14 clade, and 2 were unassignable because of sequence discordance (Cali_133) or incomplete sequence data (Cali_77) ( Table 1).
Classification of TprC ␤-barrel alleles of T. pallidum subsp. pallidum reference strains. Nucleotide sequence comparisons of full-length tprC (tp0117) genes harbored by the 41 T. pallidum subsp. pallidum reference genomes available, examined using fastx_collapser, identified five alleles (Table S2). Four of these are represented by    Table S2 for additional details. The tp0548 genotype is indicated by the last letter of the ECDCT designation. b The ␤-barrel-encoding domains of tprC from Mexico A and PT_SIF are identical. Nucleotide alignments for tprC, tprD/D2, and bamA ␤-barrels are presented in Fig. S2, S3, and S4, respectively. See Table S4 for NCBI accession numbers. c Enhanced CDC typing was performed as described in Materials and Methods. Strain type designations are as follows: the first numeral represents the number of repeats in the arp gene; the first letter represents the MseI restriction site profile in the tprE/G/J genes; and the second letter is based on sequence analysis of an 83-bp region of tp0548. The letter X is used to indicated missing data for a given position. d Clade designation based on tp0558 alone. e The indicated strain carries a unique Nichols-like tp0548 (genotype y) gene and an SS14 tp0558 gene. See Fig. S1A for additional details. f ND, not determined due to limited sample material.
Nichols and Sea81-4. Thirteen of the 22 polymorphisms dispersed across the ␤-barrel domain result in amino acid substitutions; 6 are nonconservative (Fig. 3B). The "branchsite" model in the phylogenetic analysis by maximum likelihood (PAML) package (57) identified six amino residues in full-length tprC as being positively selected (P ϭ Ͼ95%); all were in the ␤-barrel domain (Fig. 3A).
Structural modeling of the TprC ␤-barrel and topological mapping of the amino acid substitutions that differentiate the four ␤-barrel variants. To date, there have been no solved structures for any of the Tpr proteins. As a first step toward understanding the topological and/or functional implications of the sequence data described above, a structural model for the Nichols TprC ␤-barrel was generated using TMBpro (58). As shown in Fig. 3C, the predicted ␤-barrel consists of 10 antiparallel ␤-strands with five connecting extracellular loops of various sizes. Overall, this model is consistent with the general principles of amphipathic ␤-barrel structure (10,13) in that the external surface facing the lipid bilayer is highly hydrophobic, while charged residues line the channel (Fig. 4). Note that the strong positive charge within the channel could explain its high conductivity for the fluorophore Tb(DPA) 3 3Ϫ used in previous studies of MOSP C domain porin activity (35,36). Also consistent with general OMP structures are the relatively large extracellular loops and short periplasmic turns. Depending on location, substitutions in the barrel could affect either the properties of the aqueous channel or surface interactions between the treponeme and its obligate  Table 1). Hash marks (#) designate Mexico A allele TprC ␤-barrel variants (Cali_77 and Cali_164). (C) The TMBpro webserver (58) was used to generate a three-dimensional structural model for the ␤-barrel of TprC (Nichols). B-cell epitopes E1 (residues 484 to 496), E2 (residues 529 to 538), and E3 (residues 575 to 583) predicted by DiscoTope 2.0 (59) are shown in green, yellow, and dark blue. human host. Nine of the 13 amino acid polymorphisms in the four TprC ␤-barrel domain variants are in extracellular loops (3 in L3, 1 in L4, and 5 in L5), while 4 are in predicted transmembrane strands (including two positively predicted residues in ␤5) (Fig. 3A).
The majority of B-cell epitopes predicted for TprC localize to extracellular loops. Immunodominant B-cell epitopes of Gram-negative OMPs typically are located in extracellular loops and often are sequence variable (14)(15)(16)(17)(18). Analysis of the TprC ␤-barrel structural model using DiscoTope 2.0 (59) revealed that the predicted B-cell epitopes align well with extracellular loops L3, L4, and L5 (Fig. 3C) and closely correspond to variable regions (VRs) I, II, and III/IV, respectively (Fig. 3A). Importantly, each predicted B-cell epitope contains at least one nonconservative substitution (Fig. 3A). Of note, the single amino acid residue (Pro534) that differentiates the Mexico A/PT_SIF and SS14 alleles (region II) occurs in predicted epitope E2 (Fig. 3A).
T. pallidum subsp. pallidum reference strains encode only two TprD ␤-barrel variants. Centurion-Lara and coworkers (40,60) were the first to report that the tp0131 locus can harbor either a tprD allele (which is identical to tprC in the Nichols reference strain) or a tprD2 allele. As noted earlier (40), the tprD allele occurs only in reference genomes with a Nichols tprC allele (Table S2). Alignment of full-length tprD and tprD2 from the reference strains (Fig. S3) reveals much greater sequence divergence than has been seen with tprC (Fig. S2); tprD and tprD2 encode identical MOSP N domains but divergent central variable regions (CVRs) and MOSP C domains (Fig. S3). Comparison of the TprD and TprD2 ␤-barrel-encoding (MOSP C ) domains revealed four regions of variability ( Fig. 5A; see also Fig. S3). Region I consists of a single nucleotide change that results in substitution of arginine for lysine. Regions II to IV contain numerous nucleotide polymorphisms, many of which result in nonconservative amino acid substitutions. Region III also contains two single nucleotide indels.
Topological mapping of predicted B-cell epitopes and sequence differences used to distinguish TprD and D2 allele ␤-barrels. On the basis of the predicted structural model (Fig. 5B), region I lies in the extreme N terminus of the ␤-barrel. Region II encompasses ␤-strands ␤-2 and ␤3 and the intervening PL1 periplasmic loop, while regions III and IV center on predicted extracellular loops L3 and L4, respectively. Of the five B-cell epitopes predicted by DiscoTope 2.0, four reside in extracellular loops (Fig. 5A  and B). Of note, L2, which coincides with epitope E2, is conserved between TprC, TprD, and TprD2 ( Fig. 3A and 5A). Whereas L5 in TprC is variable (Fig. 3A), this loop and the corresponding epitopes (E3 and E5, respectively) are identical in TprD and D2 (Fig. 5A). In addition to amino acid differences in the extracellular loops, the models for TprC/D and TprD2 identified amino acid substitutions at the entrance and exit of the channel (Fig. 5C, highlighted in red) with the potential to affect porin functionality.
TprD2 allele ␤-barrels predominate in the clinical strains. T. pallidum subsp. pallidum strains encoding TprD2 allele ␤-barrels predominated in the 30 clinical sam-ples from which tprD sequences were obtained (Table 1). Of the eight strains assigned to the Nichols clade, only Cali_130, Cali_145, and Cali_151 contained the Nichols allele. The remaining five strains belonging to the Nichols clade, all 21 strains belonging to the SS14 clade, and 1 indeterminate strain (Cali_133) contained a TprD2 ␤-barrel variant. The ␤-barrel-encoding sequences of SF_6 and SF_58 harbored single nonsynonymous nucleotide substitutions in their predicted ␤7 strands (Fig. 5A); given the imprecision of ␤-strand prediction, it is possible that the nonconservative change in SF_58 occurs in extracellular loop L4 (epitope E4).
Classification of BamA alleles of T. pallidum subsp. pallidum reference strains. Analysis of full-length bamA genes from all available T. pallidum subsp. pallidum reference genomes using fastx_collapser identified four alleles, represented by Nichols, SS14, Mexico A, and Sea81-4 (Table S2; see also Fig. S4). The sequence differences that distinguish these alleles are restricted to six variable regions within the ␤-barrel (Fig. S4) and result in 13 amino acid changes (7 nonconservative) and a 5-amino-acid deletion in the Mexico A allele (Fig. 6A and B). Region I contains a 49-nucleotide stretch unique to the SS14 allele. Region II consists of a single nucleotide change found only in the  (58) was used to generate a three-dimensional structural model for the TprD2 allele ␤-barrel. In panels A and B, B-cell epitopes E1 (residues 398 to 402), E2 (residues 450 to 453), E3 (residues 480 to 490), E4 (residues 529 to 533), and E5 (residues 573 to 579) predicted by DiscoTope 2.0 are shown in orange, red, green, yellow, and dark blue, respectively. (C) Ribbon diagram comparing the TprD and TprD2 allele ␤-barrels. Residues that differ between TprD and TprD2 are shown as sticks. Variable residues predicted to affect the pore opening and exit are highlighted in red and labeled to indicate the corresponding variable region. Conserved residues are shown in gray. Residue numbers correspond to full-length Nichols TprD. Region III also consists of a single nucleotide with a "G" in the Mexico A and SS14 alleles and a "T" in Nichols and Sea81-4 alleles. Region IV, the most polymorphic, consists of either a single nucleotide substitution along with a 15nucleotide deletion unique to the Mexico A allele or nonidentical stretches of 15 nucleotides in the other three alleles; of note, substitutions in this region modify the polyserine tract ( Fig. 6A and B; see also Fig. S4), a unique feature of BamA orthologs in pathogenic treponemes (34,61,62). Regions V and VI consist of single nucleotide changes specific to the Mexico A and SS14 alleles, respectively. The "branch-site" model (57) did not identify any positively selected amino acids in the ␤-barrel region; however, the "site" model identified six positively selected amino acids (P ϭ Ͼ95%), distributed over regions I to IV (Fig. 6A and B).
Topological mapping of the amino acid substitutions used to distinguish between the ␤-barrels encoded by BamA alleles. We previously described and partially validated by immunofluorescence analysis a structural homology model for T. pallidum subsp. pallidum BamA consisting of 16 transmembrane ␤-strands and eight extracellular loops (37). Variable regions I through V are located entirely or largely in extracellular loops (Fig. 6A). The DiscoTope 2.0 server predicts that extracellular loops L4, L6, and L7 contain major epitopes (E2, E3, and E4, respectively; Fig. 6A and C). Note that variable region IV in L7 lies outside E4, raising the possibility that immune pressure may not be driving variability in this loop.
The clinical strains encode all four BamA ␤-barrel variants. ␤-Barrel sequences matching all four of the BamA reference alleles were identified in 27 clinical samples (Table 1; see also Fig. 6B). However, several varied from their corresponding references (Fig. 6B). Most notably, the polyserine tract of CZ_177zB and CZ_178zB, both of which encoded Sea81-4-like BamA allele ␤-barrels, contained a 12-nucleotide insertion that adds four additional serine residues (Fig. 6B). bamA ␤-barrel sequences were obtained from 25 T. pallidum subsp. pallidum clinical strains with clade designations (Table 1) A single amino acid substitution in BamA extracellular loop 4 alters immunoreactivity by patient serum samples. As noted above, the nucleotide change in variable region II results in an amino acid substitution (glutamine for leucine at residue 589) unique to the Mexico A allele ␤-barrel. We recently reported that this polymorphism markedly alters the reactivity of patient serum samples with a heterologous L4 peptide (37). Immunoblotting performed with serum samples from Cali patients infected with T. pallidum subsp. pallidum strains containing Nichols (Cali_84 and Cali_133) or Mexico A BamA ␤-barrel variants (Cali_123) confirmed this finding (Fig. 6D).
␤-Barrel profiles in clinical strains can be mosaics of reference genome profiles. Clade distributions and "across-the-board" ␤-barrel profiles of clinical strains were and green circles indicate nonsynomous substitutions in the Cali_84 (Nichols) and CZ_351 (SS14) BamA ␤-barrels, respectively. B-cell epitopes E1 (residues 529 to 533), E2 (residues 565 to 602), E3 (residues 713 to 721), and E4 (residues 758 to 773), predicted by DiscoTope 2.0, are shown in red, green, purple, and cyan boxes, respectively. (B) Nucleotide and amino acid differences, shown in red and blue, respectively, in the BamA allele ␤-barrels and variants found in T. pallidum subsp. pallidum reference genomes and 27 clinical strains. Nucleotide and amino acid numbers for variable positions are based on full-length Nichols BamA (Fig. S4). The column designated ЉDel-SerЉ indicates a 15-bp deletion within the Mexico A BamA allele ␤-barrel. The column designated "I-Ser" indicates a 12-nucleotide insertion in CZ_177zB and CZ_178zB that extends the polyserine tract by 4 residues. Consensus positions are shaded gray. Positively selected amino acids identified by branch-site model are shaded green. "ND" indicates nucleotide and amino acid positions that could not be determined by sequencing of PCR amplicon. A carat symbol (^) indicates clinical strains assigned to the Nichols clade (see Table 1 compared with those of the T. pallidum subsp. pallidum reference genomes (Table S2). Four of the six reference strains belonging to the Nichols clade (CDC A, Nichols, Chicago, and DAL-1) had Nichols tprC, tprD, and bamA alleles, while Sea81-4 and UW189B harbored tprD2 in addition to Sea81-4 tprC and bamA. Although all members of the SS14 clade harbored tprD2 alleles, variability at the other two loci, particularly tprC, was noted (Table S2). Parsimony analysis (63) of the T. pallidum subsp. pallidum reference genomes confirmed that the three OMP loci were poor predictors of clade assignment. tprC, tprD, and bamA contained only 25.9%, 0%, and 5.6% parsimony informative sites, respectively, compared to 80.6% for tp0548 and 100% for tp0558.
Sequences were obtained from all three OMP loci in five clinical strains assigned to the Nichols clade (Tables 1 and 2). Of these, only Cali_145 possessed an OMP profile resembling that of the Nichols reference strain. CZ_177zB and CZ_178zB had Sea81-4-like profiles, while Cali_101 and Cali_127 had Mexico A-like profiles. Thus, two of five Nichols clade strains had ␤-barrel variant profiles matching those of strains of T. pallidum subsp. pallidum belonging to the SS14 clade. Complete profiles were obtained for 20 strains assigned to the SS14 clade (Tables 1 and 2). Strikingly, although all had TprD2 ␤-barrel variants, none had profiles matching the SS14 reference strain profile. All had the Mexico A/PT_SIF TprC ␤-barrel variant, while only eight had the SS14 strain BamA allele ␤-barrel variant. The other BamA ␤-barrel variants were either Mexico A (n ϭ 10) or Nichols (n ϭ 2). Stratified by geographic location (Table 1), the uniformity of the SF strains contrasted with the diversity of strains from Cali.
Molecular typing did not correlate with the ␤-barrel profiles. The enhanced CDC typing (ECDCT) system has been widely used to study the diversity and epidemiology of T. pallidum subsp. pallidum strains in numerous global locales (64,65). To determine whether molecular typing is predictive of OMP profiles, ECDCT was completed on T. pallidum subsp. pallidum strains in 21 clinical samples (Tables 1 and 2). Altogether, 10 different ECDCTs were detected. 14d/g (n ϭ 7) and 14d/f (n ϭ 5) were the most prevalent and, along with 14d/d (n ϭ 2), the only genotypes found at more than one site. Five strains had an ECDCT (14d/f) matching the SS14 reference strain but Mexico A-like OMP profiles; of the 12 strains with Mexico A allele ␤-barrels in all three OMP loci, none matched the Mexico A reference strain genotype (16d/e). Two ECDCTs (14d/d and 14d/g) were associated with more than one OMP profile. Conversely, several OMP profiles were associated with more than one ECDCT. Collectively, these data indicate that OMP profiles of T. pallidum subsp. pallidum strains cannot be predicted based on ECDCT data.

DISCUSSION
For decades, syphilologists have sought means to distinguish strains of T. pallidum subsp. pallidum for epidemiological, pathogenesis-related, and vaccine-related investigations. Serological analyses of live "street strains" by Turner and Hollander in the 1950s (55) yielded evidence for antigenic differences, presumably attributable to surfaceexposed epitopes. The molecular typing method introduced by Pillay et al. in 1998 (52), based on numbers of repeats in the arp (tp0433) gene and sequence polymorphisms in Tpr subfamily II genes tprE (tpr0313), tprG (tp0317), and tprJ (tp0621), was a major advance, although subsequent studies revealed that this system insufficiently distinguishes common T. pallidum subsp. pallidum strains circulating globally (51,64). The addition of subtyping based on sequences from tp0548 markedly improved the discriminatory power of the CDC typing scheme (51,65). In parallel, Šmajs and coworkers (48,49) found that tp0548 is one of several loci that phylogenetically separate T. pallidum subsp. pallidum strains into two clusters, named for the Nichols and SS14 reference strains, and that tp0548 subtypes segregate into these two clades. Most recently, phylogenetic analysis of whole-genome sequences of geographically diverse T. pallidum subsp. pallidum strains has lent strong support to the concept of two clades, with strains belonging to the SS14 clade supplanting those belonging to the Nichols clade (50). Consistent with this notion, the majority of T. pallidum subsp. pallidum in clinical samples examined in this study grouped with the SS14 clade. They also revealed, however, that strains of T. pallidum subsp. pallidum within the Nichols clade are still actively circulating in the Eastern and Western hemispheres. Arora et al. (50) reported that the SS14 clade contains a central dominant haplotype, designated SS14-⍀, which does not include the Mexico A strain (also classified within the SS14 clade). The observation that SS14 clade members in both Cali and San Francisco have a preponderance of Mexico A allele ␤-barrel at all three OMP loci suggests that Mexico A-like strains are circulating more widely than their analysis suggests. An alternative possibility, which sequence data do not rule out, is that the SS14 clade strains in the study cohort fall within SS14-⍀ but some contain Mexico A-like OMP allelic remnants. Exclusion of bamA sequences, along with the absence of tpr genes in draft genomes used by Arora et al. (50), precludes a direct comparison of their data with those presented here.
Phylogenetic reconstructions, including those used to distinguish T. pallidum subsp. pallidum clades (48,50), rely upon genes whose sequence variation recapitulates the vertical evolution of the bacterium (66). OMP-encoding genes typically are excluded from such analyses because they undergo mutations and rearrangements that give rise to phylogenetic trees in conflict with those derived from "reference" genes (66). However, sequence variation of OMPs to enhance environmental fitness, virulence, and immune evasion plays a central role in the evolution and epidemiology of pathogenic bacteria (15-17, 67, 68). Parsimony analysis revealed that the evolutionary histories of tprC, tprD, and bamA diverge greatly from those of genes used for clade differentiation (i.e., tp0548 and tp0558). Inspection of the three OMP genes/proteins in reference and clinical strains explains this dichotomy and provides evidence that OMPs in T. pallidum subsp. pallidum are subject to host-driven, adaptive mechanisms. Except for a few nucleotides in the N-terminal periplasmic portion of Mexico A tprC, the variable regions for tprC and bamA reside within the surface-exposed, OM-embedded ␤-barrel, just as one would expect if selection pressures exerted by the host were driving variation.
Sequence variation within the ␤-barrel-encoding regions of tprC, tprD, and bamA ranged from point mutations with single amino acid substitutions to small stretches of DNA encoding multiple amino acid differences. Gray et al. (69) contended that the variable regions in tprC containing multiple amino acid changes are more consistent with small "site-specific" gene conversion events than with accumulated point mutations, although they were uncertain whether the source(s) of the acquired sequences is intra-or intergenomic. Regardless, there is no reason that a recombinatorial mechanism(s) would be limited to tprC. Indeed, the mosaic OMP profiles in some clinical strains can best be explained by recombination/conversion of larger DNA fragments within and between clades. It is worth noting that genomic sequencing identified a number of OMP loci in pathogenic treponemes, including BamA in T. pallidum subsp. pallidum, where recombination appears to have occurred (50,(70)(71)(72). The most plausible scenario for intergenomic exchange during human syphilis would be the presence of anogenital ulcers coinfected with T. pallidum subsp. pallidum strains containing genetically divergent OMP loci.
According to structural models, the variable regions in T. pallidum subsp. pallidum OMP ␤-barrels coincide with extracellular loops predicted to contain B-cell epitopes.
Immunoblot results for L4 of BamA underscore the antigenic impact of even single nonsynonymous amino acid substitutions in a surface-exposed region. Substitutions and antigenic diversity in the predicted extracellular loops for Tpr ␤-barrels likewise could serve in a similar role in immune evasion. The cumulative effect of the OMP loop variants noted in this study, along with those in TprK (56,65,73) and as-yetuncharacterized T. pallidum subsp. pallidum OMPs (33,40), would be to foster strain diversity among T. pallidum subsp. pallidum strains and thus their ability to persist at the population level. There are numerous examples of virulence-related OMPs in bacterial pathogens playing a role in maintenance of cellular homeostasis and/or outer membrane integrity (19-21, 23, 25, 74-76). One also must consider the possibility, therefore, that extracellular loop variants impact interactions at the host-pathogen interface during syphilitic infection. The hot spot for variation in extracellular loop L7 of BamA, which lies outside the adjacent predicted major epitope (E4), might be one example.
TprD and TprD2 contain identical MOSP N domains but highly divergent CVRs and ␤-barrels (40,69,77). Thus, although involving only two alleles, generation of diversity at the tprD locus was a more complex process than was seen with tprC and cannot be explained solely by events shaping the ␤-barrels. Moreover, since the CVR is periplasmic (36), nonimmunological selection pressures must also have been at work. Comparison of tprD and D2 genes suggests that variation in the ␤-barrel likely arose from gene conversion events involving relatively small segments of DNA similar to those proposed for tprC (69). However, comparison of the variable regions in the tprC and tprD/D2 loci (see Fig. S5 in the supplemental material) brings to light important differences. While separate conversions involving L3 appear to have occurred in the tprC and tprD genes, two exchanges occurred only in the tprD locus: region II, spanning transmembrane strands ␤-2 and ␤3, and region IV, containing L4 plus flanking DNA from strands ␤7 and ␤8. These differences could affect not only the corresponding predicted extracellular loops of TprD2 but also its porin channel function. Differences in substrate preferences that broaden or enhance the capacity to import water-soluble nutrients across the outer membrane theoretically would be of great benefit to an extreme auxotroph and obligate pathogen such as T. pallidum subsp. pallidum (27,77). Additional point mutations, such as those in the TprD2 ␤-barrels of SF_6 and SF_58, might further fine-tune channel functionality. The current worldwide predominance of strains containing tprD2 could reflect, at least in part, the greater fitness conferred by the presence of functionally and antigenically distinct TprC and TprD OMPs.
Compared to OMPs of many dual-membrane pathogens (11,18,(78)(79)(80)(81), including the sexually transmitted organisms Neisseria gonorrhoeae (82,83) and Chlamydia trachomatis (84,85), a surprisingly small number of variants for each T. pallidum subsp. pallidum OMP were found. The high degree of similarity between the ␤-barrel sequences of T. pallidum subsp. pallidum in clinical samples and their reference allele counterparts argues that this limited diversity cannot be explained solely by the comparatively small number of T. pallidum subsp. pallidum genomes sequenced to date. Two possible explanations, which are not mutually exclusive, can be envisioned. One is that T. pallidum subsp. pallidum OMPs are subject to "uneven" immune pressure due to various microbiologic factors, such as low copy numbers and/or heterogeneous expression within spirochete populations (33,77), and to variability of antibody responses in persons with different genetic backgrounds. A second possibility is that structural constraints counterbalance immunologic forces promoting loop diversity. The notion of the presence of favored loop sequences that protect the bacterium has been invoked to explain the preservation of sequence types of PorB, the dominant porin of Neisseria meningitidis, commonly associated with invasive disease in surveys of endemic and epidemic meningococcal strains (18).
The worldwide resurgence of syphilis has kindled a sense of urgency for vaccine development (2,4,86). To be efficacious globally, a syphilis vaccine must target surface-exposed (i.e., antibody-accessible) determinants expressed by geographically disparate T. pallidum subsp. pallidum strains. Data presented here make clear that vaccine development based on integral OMPs needs to proceed along two broad fronts. One is to expand the list of candidate vaccinogens through topological and structural characterization of proteins known or predicted to form outer membraneembedded ␤-barrels (33), coupled with assessment of opsonophagocytosis activity ex vivo and protection in the experimental rabbit model (86). The second is to refine methods for genomic sequencing of T. pallidum subsp. pallidum strains in clinical samples (56) to catalog sequence diversity among individual OMPs on a global level. From a vaccine standpoint, the objective would be to develop a mono-or multivalent vaccine based on OMP alleles from T. pallidum subsp. pallidum strains that are circulating in at-risk populations worldwide. Similar vaccine strategies have been proposed for Borrelia burgdorferi, the Lyme disease spirochete (87)(88)(89). The ability to use conserved extracellular loops (e.g., L2 in TprC/D/D2, L5 in TprD/D2, and L3 in BamA) would circumvent the difficulties associated with expression and purification of full-length OMPs on a mass scale (70). Quantitation of treponemal burdens. DNAs were extracted using a DNeasy blood and tissue kit (Qiagen, Valencia, CA) and were eluted in 100 to 200 l AE buffer (10 mM Tris-Cl, 0.5 mM EDTA; pH 9.0). DNA concentrations were determined by the use of a NanoDrop instrument (Thermo Fisher, Pittsburgh, PA) or by analysis of absorbance at 260/280 nm. qPCR of polA (tp0105) was performed as described previously (91).

MATERIALS AND METHODS
Molecular typing. Subtyping based on arp repeats and MseI polymorphisms in tprE, tprG, and tprJ was performed as described previously (52). Strain typing was based upon sequence variability in tp0548 as described previously by Marra et al. (51) using primers listed in Table S4 in the supplemental material. Strain typing was based upon sequence variability in tp0548 partial sequences as described previously.
Nested PCR and sequencing of the ␤-barrel-encoding regions of tp0558, tprC (tp0117), tprD (tp0131), and bamA (tp0326). Table S3 lists unpublished primers used in this study. Nested PCR of tp0558 was performed as described previously (48). First-round amplifications of the tprC and tprD ␤-barrel-encoding regions were performed using GoTaq Flexi DNA polymerase (Promega, Madison, WI) according to the manufacturer's instructions. The resulting amplicons were subjected to gel purification using a QIAquick gel extraction kit (Qiagen) and then reamplified using internal (second-round) primers and Ex Taq (Clontech, Mountain View, CA). Nested PCR was carried out for bamA using GoTaq Flexi according to the manufacturer's instructions. Second-round amplicons for tprC, tprD, and bamA were subjected to gel purification and sequenced in both forward and reverse orientations. For patients Cali_84 and Cali_133, bamA second-round amplicons were also cloned into pCR2.1-TOPO vector (Invitrogen) according to the manufacturer's instructions and 10 individual clones were sequenced.
Cloning, expression, and purification of BamA L4 loops. Cloning, expression, and purification of the L4 loops of T. pallidum subsp. pallidum Nichols and Mexico A were described previously (37).
SDS-PAGE and immunoblotting. Recombinant His-tagged proteins were resolved by the use of AnykD Mini-Protean TGX gels (Bio-Rad) and transferred to nylon-supported nitrocellulose. Membranes were blocked and then probed overnight at 4°C with normal or immune rabbit serum at a dilution of 1:500 or with normal human or human syphilitic (Cali_84, Cali_123, and Cali_133) serum samples at a dilution of 1:250. Bound antibody was detected with horseradish peroxidase (HRP)-conjugated goat anti-rabbit antibody (Southern Biotech, Birmingham, AL) or HRP-conjugated goat anti-human IgG antibody (Pierce, Rockford, IL) at a dilution of 1:30,000. Immunoblots were developed using SuperSignal West Pico chemiluminescent substrate (Thermo Fisher Scientific, Waltham, MA).
Phylogenetic analysis. Sequence alignments were performed using MacVector v. 16.0.8 (Apex, NC). Multiple-sequence alignments were performed using either MacVector or fastx_collapser from the Fastx-toolkit (version 0.0.14) (http://hannonlab.cshl.edu/fastx_toolkit/). Genes evolving under conditions of positive selection were identified with the maximum likelihood method (92) implemented in PAML version 4 (93) and its user interface PAMLX (94). Site models of PAML allow the ratio of nonsynonymous/ synonymous mutations () to vary in each codon (site) in the gene. Branch-site models search for positive selection in lineages where different rates of may occur (95). Two site models and one branch-site model of PAML were used.
Structural modeling and epitope prediction. Algorithms from the TMBpro web server (58) were used to generate the three-dimensional structures for the ␤-barrels of TprC (Nichols) and TprD2. The structural model of BamA (Nichols) was generated as described by Luthra et al. (37) using the solved structure of full-length BamA (PDB ID: 3KGP) from Neisseria gonorrhoeae as the template and the ModWeb server (https://modbase.compbio.ucsf.edu/modweb/). Discontinuous epitopes were predicted using the DiscoTope 2.0 server (59), and the calculated epitopes were projected onto their respective structural models using Chimera (96). The electrostatic potential display was generated using ICM MolBrowserPro (97).

ACKNOWLEDGMENTS
We express our gratitude to Susan Philip, San Francisco City Clinic, Population Health Division, San Francisco Department of Public Health, and Allan Pillay, Centers for Disease Control and Prevention, for providing extracted DNAs from SF patients. We are indebted to the unwavering enthusiasm and support of Nancy Saravia; CIDEIM staff members Maria Alejandra Castrillón, Juan Pablo Garcés, Luisa Rubiano, and Laura Potes; the city of Cali Secretariat of Public Health and the health care providers from Cali's public health hospital network who participated in patient recruitment.
This work was partially supported by NIH grants R01 AI26756 (J.D.R.) and R03 TW009172 (A.R.C. and J.C. We report no commercial or other associations that might pose a conflict of interest.