Penicillin-Binding Protein Transpeptidase Signatures for Tracking and Predicting β-Lactam Resistance Levels in Streptococcus pneumoniae

ABSTRACT β-Lactam antibiotics are the drugs of choice to treat pneumococcal infections. The spread of β-lactam-resistant pneumococci is a major concern in choosing an effective therapy for patients. Systematically tracking β-lactam resistance could benefit disease surveillance. Here we developed a classification system in which a pneumococcal isolate is assigned to a “PBP type” based on sequence signatures in the transpeptidase domains (TPDs) of the three critical penicillin-binding proteins (PBPs), PBP1a, PBP2b, and PBP2x. We identified 307 unique PBP types from 2,528 invasive pneumococcal isolates, which had known MICs to six β-lactams based on broth microdilution. We found that increased β-lactam MICs strongly correlated with PBP types containing divergent TPD sequences. The PBP type explained 94 to 99% of variation in MICs both before and after accounting for genomic backgrounds defined by multilocus sequence typing, indicating that genomic backgrounds made little independent contribution to β-lactam MICs at the population level. We further developed and evaluated predictive models of MICs based on PBP type. Compared to microdilution MICs, MICs predicted by PBP type showed essential agreement (MICs agree within 1 dilution) of >98%, category agreement (interpretive results agree) of >94%, a major discrepancy (sensitive isolate predicted as resistant) rate of <3%, and a very major discrepancy (resistant isolate predicted as sensitive) rate of <2% for all six β-lactams. Thus, the PBP transpeptidase signatures are robust indicators of MICs to different β-lactam antibiotics in clinical pneumococcal isolates and serve as an accurate alternative to phenotypic susceptibility testing.

alterations in the transpeptidase domains (TPDs) of penicillinbinding proteins (PBPs) that reduce affinity for the antibiotics to attach to these sites. Growth inhibition of wild-type pneumococcal strains by most ␤-lactam antibiotics is primarily due to inhibition of PBP2x (11). The primary role of PBP1a, PBP2b, and PBP2x for determining ␤-lactam MICs was indicated when a mixture of three alleles from a single highly penicillin-resistant pneumococcal strain was used to transform a susceptible strain to achieve the same level of resistance (12). Published evidence supports that changes within PBP2b and PBP2x are essential for lower-level ␤-lactam resistance (13)(14)(15), with these and additional changes within PBP1a essential for high-level resistance (16,17). Other changes both within and outside the PBPs presumably compensate for fitness costs (18)(19)(20). Many resistant pbp alleles enter into the pneumococcal population through interspecies recombination with other mitis group members of the Streptococcus genus (21)(22)(23). Subsequent intraspecies recombination and mutation further diversify pbp loci, resulting in a large number of pbp alleles associated with a broad range of ␤-lactam MICs.
To effectively track genotypes and their associated MICs of ␤-lactam-nonsusceptible pneumococci, we propose a classification system in which a pneumococcal isolate is assigned a "PBP type" based on amino acid residues in the TPDs of the three major PBPs, PBP1a, PBP2b, and PBP2x. Here we report 307 PBP types identified from 2,528 invasive S. pneumoniae isolates and use the PBP type to predict ␤-lactam MICs as measured by microdilution.
Our results indicate that the PBP typing system will provide an alternative to conventional susceptibility testing and might improve our ability to extract critical resistance information from nonculturable clinical specimens.
We classified a TPD variant as "divergent" if it showed less than 90% amino acid sequence identity with the corresponding TPD in PBP type 2-0-2. Thus, an isolate could harbor 0 to 3 divergent TPDs in PBP1a, PBP2b, and PBP2x. We observed a stepwise increase in PEN MIC as the number of divergent TPDs in an isolate increased (Fig. 1B). While almost all PEN-susceptible isolates contained no divergent TPDs, isolates containing 1, 2, and 3 divergent TPDs showed median PEN MICs of 0.25 g/ml, 2 g/ml, and 4 g/ml, respectively. The differences in median PEN MICs were highly significant (P Ͻ 1 ϫ 10 Ϫ15 for all pair-wise comparisons by Mann-Whitney U test). Similar stepwise increases in MICs with increasing numbers of divergent TPDs were also observed for the other five ␤-lactams (see Fig. S2 in the supplemental material).
Amino acid changes common to all highly resistant isolates (PEN MIC of Ն4 g/ml) appeared to be distributed unevenly across the TPDs. There were 27 TPD positions in which an amino acid change relative to PBP type 2-0-2 was found in 100% (n ϭ 253) of highly resistant isolates ( Fig. 1A; see Table S1 in the supplemental material). We observed a modestly yet significantly higher density of such positions within 20 amino acids (AA) from a TPD active site (Fig. 1C, 0.047 versus 0.021, P ϭ 0.038, Fisher's exact test). While some of the 27 positions have been implicated in altering affinity to ␤-lactams, such as T371 and TSQF574 -577 in PBP1a (24), T446 and E476 in PBP2b (25), and T338 in PBP2x (26), many others were of unclear function.
The PBP type correlated with ␤-lactam MICs across diverse MLSTs. Our study sample was comprised of isolates with diverse genomic backgrounds, including 403 different multilocus sequence types (MLSTs). The PBP type and MLST were correlated but not completely overlapping (see Fig. S3 in the supplemental material), which allowed us to examine the relative contribution of PBP and genomic background to ␤-lactam MIC. First, we examined the distribution of PEN MICs across MLSTs within 9 representative PBP types (Fig. 2), of which 3 showed median PEN of 0.03 g/ml (top panel), 3 showed median PEN of 0.12 to 0.25 g/ml (middle panel), and another 3 showed median PEN of 2 to 4 g/ml (bottom panel). The number of MLSTs observed within each PBP type ranged from 2 to 18. We found essentially identical MICs across MLSTs in the susceptible PBP types. For type, and a black bar indicates that the aligned amino acid differed from the amino acid in the reference PBP type 2-0-2. The width of the row is proportional to the number of isolates. Rows were sorted from top to bottom by decreasing order of median penicillin (PEN) MICs (in micrograms per milliliter) and then by increasing order of number of isolates. The PBP type 2-0-2 row also shows the start and end amino acid position of each TPD. There were 27 TPD sites in which an amino acid differed from that in PBP type 2-0-2 in all isolates with PEN MICs of Ն4 g/ml (open triangles at the top of the figure); we define these 27 TPD sites as resistance-associated TPD sites. (B) Boxplot of PEN MICs among isolates containing 0, 1, 2, or 3 divergent TPDs. A divergent TPD showed less than 90% amino acid sequence identity with the corresponding TPD in PBP type 2-0-2. Whiskers indicate the farthest value that is within 1.5 interquartile range (IQR) of the hinges. (C) Resistance-associated TPD sites were more frequently found among amino acid sites that are close to an active site (Ͻ20 amino acids [AAs]). The three TPD domains contain a total of 914 amino acid sites, which were classified into two groups according to whether the distance to an active site is less than 20 AAs (n ϭ 319) or more than 20 AAs (n ϭ 595). The frequency of resistance-associated TPD sites in each group is shown. Error bars are 95% CIs. example, 244 of the 248 PBP type 2-0-2 isolates were of the same PEN MIC even though they represented 18 different MLSTs. The resistant PBP types showed slightly more MIC variation (Fig. 2, middle and bottom panels), but no association between MLST and PEN MIC level (treated as a categorical variable) within a PBP type was found (P Ͼ 0.05 by Fisher's exact test). Similar within-PBP type MIC distributions were also observed for the other 5 antibiotics (see Fig. S4 in the supplemental material). The results suggested that PBP type consistently correlated with ␤-lactam MIC across diverse MLST-defined genomic backgrounds.
The above notion was further supported by an analysis of variance using mixed-effect models ( Table 2). While the MLST-only model explained 73% to 91% of variations in the log 2 MIC for the 6 ␤-lactams (Table 2, model 1), the PBP type-only model explained a higher proportion (94% to 99%) of the variations. In the model incorporating both the PBP type and the MLST (Table 2, model 3), the amount of variation attributed to the PBP type remained around 94% to 98%, but the amount of variation attributed to the MLST dropped to 0.00% to 0.95%. Compared to the PBP type-only model, adding the MLST improved goodness of fit minimally and only for 3 of the 6 ␤-lactams (Table 2, model 3 versus model 2). The results indicated that after accounting for the PBP type, genomic background contributed little to ␤-lactam MICs at the population level.
PBP type as ␤-lactam MIC predictor. The predominant role of PBP type in explaining MIC variation motivated us to test whether PBP type, or its corresponding TPD amino acid sequences, could be used as a practical predictor of ␤-lactam MICs. We constructed three predictive models as described in Materials and Methods. The predicted MICs of the six ␤-lactams generated by the leave-one-out approach were compared against the microdilution MICs (Fig. 3). In isolates of trained PBP types (trained PBP type explained in "Predictive models of ␤-lactam MIC" in Materials and Methods), MICs predicted by the mode MIC (MM) model showed essential agreement (EA) of Ͼ98%, category agreement (CA) of Ͼ94%, major discrepancy (maj) rate of Ͻ3%, lower which showed less than 90% amino acid sequence identity with the corresponding TPD in PBP type 2-0-2, are indicated with an asterisk (e.g., 0-1*-1). 95% confidence interval (95% CI) of the very major discrepancy (vmj) rate of Ͻ1.5%, and upper 95% CI of the vmj rate of Ͻ7.5% for all six ␤-lactams (Fig. 3). Similar results were observed for the random forest (RF) and elastic net (EN) models (Fig. 3), except that the EN model showed a higher vmj rate for PEN than the other two models did (Fig. 3D). These results supported the predicted MIC by the MM and RF models as an acceptable equivalent to the microdilution MIC among the trained PBP types. Predicted PEN MICs by the MM model based on all isolates are shown in Table S2 in the supplemental material. Figure S5 shows a diagram indicating how to predict MICs for a specific isolate using these tables. For isolates of nontrained PBP types (nontrained PBP type explained in "Predictive models of ␤-lactam MIC" in Materials and Methods), a generally lower percentage of EA and CA was observed for predictions made by the three models ( Fig. 4A and  B). The RF and EN models appeared to perform slightly better in EA than the MM model (Fig. 4A). The maj and vmj errors were also higher among nontrained PBP types (see Fig. S6 in the supplemental material), but the evaluation was limited by the small sample size, which resulted in wide confidence intervals. For PEN, a nontrained PBP type differed from its most closely related, trained PBP type by 1 to 50 AA with a median of 2 AAs (Fig. 4C). Among the 83 isolates whose nontrained PBP type showed only 1 AA difference from its most closely related, trained PBP type, the proportion EA for PEN predicted by the MM model (81/83) was not significantly different from what was observed among isolates of trained PBP type (2,314/2,355, P ϭ 0.66 by Fisher's exact test). For every 1 additional AA difference, there was, on average, a 1.09fold (95% CI, 1.14 to 1.04) decrease in the odds of being EA [logistic regression, t(170) ϭ Ϫ3.69 and P ϭ 0.0002]. Thus, the predictive performance of the MM model decreases significantly for the nontrained PBP types containing more than 1 AA difference.
Because the pneumococcal genomic background evolves over time, we examined whether the MM model prediction performance could be affected by the length of time between the training data set and the testing data set. Isolates in one of the five surveillance years (1998, 1999, 2009, 2012, and 2013) were used as the training data set for the MM model, and the fitted model was used to predict PEN MIC for isolates in subsequent years with trained PBP types. EA between the predicted PEN MIC and microdilution MIC was calculated for each year tested (Fig. 5). The year-specific EA ranged from 91.4% to 100% with a median of 98.9% (Fig. 5). We found no significant association between separation time and year-specific percentage EA [Fig. 5, linear regression, t(13) ϭ Ϫ1.59 and P ϭ 0.14].

DISCUSSION
In this study, we proposed a PBP typing system that links amino acid sequence variation in the TPDs of PBP1a, PBP2b, and PBP2x to ␤-lactam MIC levels among invasive pneumococcal isolates. We found that PBP types containing divergent TPD variantsthose exhibiting more than 10% amino acid sequence difference from the typical susceptible PBP type 2-0-2-were associated with increased ␤-lactam MICs. Isolates with high-level PEN resistance commonly harbored a divergent TPD within all three PBPs. One possible reason is that epistasis between pbp loci is essential for these strains to maintain both high-level ␤-lactam resistance and fitness. The TPD positions in which amino acids frequently differed between susceptible and resistant isolates appeared to be distributed nonrandomly. This may reflect structural constraints dictating that only changes in specific TPD positions would allow a viable, resistant TPD variant. Alternatively, this could indicate that the observed resistant TPD sequences have not yet diverged sufficiently from recent founder sequences. Additional studies are needed to understand the relative contributions of these mechanisms.
In genomically tracking ␤-lactam resistance, a typing system that correlates with the phenotype closely and robustly is desirable. Since sequence variations both within and outside the PBPs have been reported to influence ␤-lactam MICs (19,20,27), we examined how much the phenotype could vary among isolates of the same PBP type but of diverse genomic backgrounds as defined by MLST. The MICs within susceptible PBP types were very stable regardless of genomic background. The results were consistent with the notion that sequence variations outside the three TPDs may alter the level of resistance in isolates carrying resistant PBPs, but these variations rarely confer resistance themselves (28). While the resistant PBP types tended to show larger MIC varia-tions, there was no evidence that the MIC differed significantly by MLST within the same PBP type. Several different factors could explain the lack of association between MLST and MIC within a PBP type. One possibility is that MLST failed to represent the non-TPD variations actually modulating MICs. Another possibility is that non-TPD variations influence MIC only minimally so that there was not enough power to detect such effects. Nonetheless, analysis of variance showed that the PBP type, when incorporated into a model for random effect, explained the vast majority of variation in ␤-lactam MICs both before and after accounting for MLST, suggesting that the PBP type was a dominant and robust determinant of ␤-lactam MIC at the population level.
One practical application of our proposed PBP typing system is to predict the ␤-lactam resistance level based on sequence information. This application could be important in developing susceptibility testing for clinical specimens from which an isolate cannot be recovered. It may also facilitate estimating the prevalence and spread of ␤-lactam resistance in the current era when bacterial whole-genome sequencing data rapidly accumulate, yet accompanying MIC information is not always available or reliable. Here we developed three basic models to predict ␤-lactam MICs based on PBP type. The simplest MM model just assigned the most frequently seen MIC of a trained PBP type to a test isolate of the same PBP type and approximated any nontrained PBP type by the most closely related trained PBP type. The more-complex RF and EN models were designed to quantify the contribution of each individual TPD position from the training data set and combine these contributions to make a prediction. Predicted MICs by MM and RF models showed essentially equivalence with the microdilution MIC for trained PBP types but not for the nontrained ones. The results supported the use of these predicted MICs as valid susceptibility testing results for isolates of trained PBP types. It is also important to obtain microdilution MICs for all newly found PBP types, thus expanding the number of trained PBP types. Better modeling design and more training data could help identify conserved amino acid sites that are predictive of MICs in other data sets. Characterizing the specific contributions of substitutions at these conserved, predictive sites is important because the pneumococcal genome is highly plastic, precluding the documentation of all possible PBP types. Further, models that quantify the contributions of individual TPD positions and/or their interactions to ␤-lactam MICs may provide insights into resistance mechanisms and should be a priority of future investigation.

MATERIALS AND METHODS
Isolates and characterization. The study sample was 2,528 broadly representative pneumococcal isolates selected from the Active Bacterial Core surveillance (ABCs) over the years 1998 to 2015. ABCs is an active, population-based and laboratory-based surveillance system that is part of the Centers for Disease Control and Prevention's (CDC) Emerging Infections Program. Cases of invasive pneumococcal diseases (IPD) were defined by the isolation of pneumococci from a normally sterile site in residents of the surveillance areas in 10 different states (29-31) (see ABCs surveillance reports for population sizes, IPD incidence, antimicrobial susceptibility data, and other information at http://www.cdc.gov/abcs/ reports-findings/surv-reports.html).
Isolates were characterized by a combination of conventional testing and short-read whole-genome sequence (WGS) analysis. MICs for six ␤-lactam antibiotics, penicillin (PEN), amoxicillin (AMO), meropenem (MER), cefotaxime (TAX), ceftriaxone (CFT), and cefuroxime (CFX), were determined by the broth microdilution method as previously described (32). MIC interpretive standards are shown in Table S3 in the supplemental material and are consistent with CLSI document M100-23 (33). Unless otherwise specified, PEN susceptibility and resistance refer to PEN MIC of Յ0.06 and Ն0.12 g/ml, respectively. When MIC was analyzed as a numeric variable, an MIC of "ϭ X" was treated as value X; an MIC of "ՅX" was approximated as value X; and an MIC of "ϾX" was approximated as value 2X.
Multilocus sequence types (MLSTs) and the TPD amino acid sequences of PBP1a, PBP2b, and PBP2x were extracted using a validated pneumococcal typing pipeline as described previously (34). Databases of 68, 78, and 118 unique TPD amino acid sequences were compiled for PBP1a, PBP2b, and PBP2x, respectively (see Table S4 in the supplemental material). Each unique TPD amino acid sequence was assigned an identifier, and the three-number combination from each isolate was assigned as its "PBP type." For example, PBP type 2-0-2 describes an isolate containing a composite TPD amino acid sequence pattern of PBP1a-2, PBP2b-0, and PBP2x-2. All isolates through 2013 were serotyped with latex agglutination and the quellung reaction employing CDC antisera. Serotypes of isolates recovered after 2013 were determined by WGS and the typing pipeline (34).
PBP active site motifs (amino acid sequences) were defined as the following according to Hakenbeck et al. (20): S 370 TMK, S 466 SN, and K 557 TG in PBP1a, S 386 TMK, S 443 SN, and K 614 TG in PBP2b, and S 337 TMK, S 395 SN, and K 547 SG in PBP2x. The position of a motif's first residue was used in calculating distance from an active site.
Analysis of variance for ␤-lactam MIC. Mixed-effect models incorporating random effects for PBP type and MLST were used for the analysis of variance. We chose mixed-effect models because the observed PBP types and MLSTs represented a sample (subset) of all possible types in the pneumococcal population. Additionally, mixed-effect models could facilitate the analysis of data that were unbalanced and contained cells of 0 count in a contingency table, which was the case for our study sample.
For each ␤-lactam, four models were constructed with the log 2transformed MIC as the dependent variable and using another or other factors or variables as the covariate(s). In model 1, only MLST was used as a covariate. In model 2, only PBP type was used as a covariate. In model 3, both PBP type and MLST were used as covariates. In model 4, PBP type, MLST, and their interaction term were used as covariates. These models incorporated random effect(s) for all covariate(s). The only fixed-effect term was the intercept. Nested models were compared by a likelihood ratio test; a P value of Ͻ0.05 was considered evidence of significantly better fit for the more-complex model. The R package "lme4" was used for model fitting and comparison. Model 4 showed no significant increase in goodness of fit compared to model 3 for any ␤-lactams and therefore was excluded from further analysis.
Predictive models of ␤-lactam MIC. The study sample (2,528 pneumococcal isolates) was divided into a training data set and a test data set. MICs in the test data set were predicted using models parameterized by the training data set. We denoted a PBP type as "trained" if it was present in the training data set with MIC data; otherwise, the PBP type was denoted as "nontrained." In the "leave-one-out" cross-validation, which represents a special case of dividing the study sample into training and test data sets, each isolate was used in turn as the testing data set, while all other isolates were used as the training data set.
(i) MM model. In the mode MIC model, the highest MIC among the most frequently observed MIC(s) for a PBP type in the training data set was assigned as the predicted MIC of the same PBP type in the test data set. Any PBP type not seen in the training data set was approximated by a training PBP type that showed the highest amino acid identity.
(ii) RF model. In the random forest model, we used the amino acid at each position of the three TPDs as predictors to train an RF model for the continuous outcome log 2 (MIC). The trained model then predicted the MIC of a test isolate based on its TPD amino acid sequence. For a given position in the TPDs, any amino acid not seen in the training data set was approximated by the training amino acid with the least BLOSUM62 distance. The R package "randomForest" was used for RF model training and prediction.
(iii) EN model. Similar to the RF model, the amino acid at each position of the three TPDs was used as predictors to train an elastic net model for the continuous outcome log 2 (MIC). The R package "glmnet" was used for EN model training and prediction.
To evaluate prediction performance, we calculated the essential agreement (EA), category agreement (CA), very major discrepancy (vmj), and major discrepancy (maj) between the predicted MIC (new method) and broth microdilution MIC (reference method) according to the FDA guidance document for antimicrobial susceptibility test systems (35). Briefly, the FDA guidance document's definitions follow: CA, agreement of interpretive results (susceptible [S], intermediate [I], or resistant [R]); EA, agreement within (plus or minus) one twofold dilution of the reference MIC; maj, the reference category result is S and the new method result is R; vmj, the reference category result is R and the new method result is S. Criteria for acceptable performance in the FDA guidance document include (i) essential and category agreement of Ͼ89.9%; (ii) a maj rate of Յ3%; and (iii) an upper 95% confidence limit for the true vmj rate of Յ7.5% and the lower 95% confidence limit for the true vmj rate of Յ1.5%.
Statistics. Correlation between two numerical variables was quantified by Spearman's rank correlation coefficient (). Association between two categorical variables was evaluated by Fisher's exact test. Equal median between two groups was examined by Mann-Whitney U test. Confidence interval for proportion was constructed using the exact binomial method. All statistical analyses were performed in R version 3.2.2 (36); graphics were also created in R version 3.2.2.