ABSTRACT
Group A streptococci (GAS) are genetically diverse. Determination of strain features can reveal associations with disease and resistance and assist in vaccine formulation. We employed whole-genome sequence (WGS)-based characterization of 1,454 invasive GAS isolates recovered in 2015 by Active Bacterial Core Surveillance and performed conventional antimicrobial susceptibility testing. Predictions were made for genotype, GAS carbohydrate, antimicrobial resistance, surface proteins (M family, fibronectin binding, T, R28), secreted virulence proteins (Sda1, Sic, exotoxins), hyaluronate capsule, and an upregulated nga operon (encodes NADase and streptolysin O) promoter (Pnga3). Sixty-four M protein gene (emm) types were identified among 69 clonal complexes (CCs), including one CC of Streptococcus dysgalactiae subsp. equisimilis. emm types predicted the presence or absence of active sof determinants and were segregated into sof-positive or sof-negative genetic complexes. Only one “emm type switch” between strains was apparent. sof-negative strains showed a propensity to cause infections in the first quarter of the year, while sof+ strain infections were more likely in summer. Of 1,454 isolates, 808 (55.6%) were Pnga3 positive and 637 (78.9%) were accounted for by types emm1, emm89, and emm12. Theoretical coverage of a 30-valent M vaccine combined with an M-related protein (Mrp) vaccine encompassed 98% of the isolates. WGS data predicted that 15.3, 13.8, 12.7, and 0.6% of the isolates were nonsusceptible to tetracycline, erythromycin plus clindamycin, erythromycin, and fluoroquinolones, respectively, with only 19 discordant phenotypic results. Close phylogenetic clustering of emm59 isolates was consistent with recent regional emergence. This study revealed strain traits informative for GAS disease incidence tracking, outbreak detection, vaccine strategy, and antimicrobial therapy.
IMPORTANCE The current population-based WGS data from GAS strains causing invasive disease in the United States provide insights important for prevention and control strategies. Strain distribution data support recently proposed multivalent M type-specific and conserved M-like protein vaccine formulations that could potentially protect against nearly all invasive U.S. strains. The three most prevalent clonal complexes share key polymorphisms in the nga operon encoding two secreted virulence factors (NADase and streptolysin O) that have been previously associated with high strain virulence and transmissibility. We find that Streptococcus pyogenes is phylogenetically subdivided into loosely defined multilocus sequence type-based clusters consisting of solely sof-negative or sof-positive strains; with sof-negative strains demonstrating differential seasonal preference for infection, consistent with the recently demonstrated differential seasonal preference based on phylogenetic clustering of full-length M proteins. This might relate to the differences in GAS strain compositions found in different geographic settings and could further inform prevention strategies.
INTRODUCTION
Approximately 1.8 million new severe disease infections attributed to group A streptococci (GAS) (acute rheumatic fever, rheumatic heart disease, poststreptococcal glomerulonephritis, and invasive disease) occur each year worldwide (1). In the United States, 10,600 to 13,400 invasive GAS infections occur annually, of which approximately 12% lead to death (2).
GAS are genetically very diverse, with various complements of virulence factors and adhesins (3). Whole-genome sequence (WGS) analyses have provided insights into the causal relationships of various surface proteins, secreted toxins, hyaluronic acid capsule, and transcriptional features with the success of individual GAS lineages. For example, the emergence of the pandemic M1 strain in the early 1980s and the later emergence of an emm89 strain in the 2000s have both coincided with variation in the nga operon that increases the expression of two important toxins to enhance transmission and virulence (4, 5). Systematic large-scale population-based invasive GAS (iGAS) surveillance employing WGS can provide continued insights into transmission and disease manifestations.
The most intensely studied GAS vaccine candidates in the past 20 years have been multivalent M serotype-specific formulations, although a vaccine based on the conserved C repeat M region has been subjected to a phase 1 clinical trial (6). Geographic differences in M type distributions pose a significant problem for vaccine development; however, cross-opsonization between M types in a 30-valent type-specific vaccine has been demonstrated recently through the use of rabbit antisera in indirect opsonophagocytic killing assays (7, 8). A vaccine combining a multivalent M type-specific protein and a more conserved M-like protein could potentially target the majority of Streptococcus pyogenes strains (9).
Previous work associated emm locus patterns, based on sequences of peptidoglycan-spanning sequences encoded by emm and neighboring emm-like genes, with preferences for throat and/or skin reservoirs in human hosts (10). In the United States, common emm types such as 1, 3, and 12 were pattern A-C strains that demonstrated a preference for the throat reservoir. M types such as 4, 22, and 77 were pattern E strains that demonstrated no preference between the throat and skin reservoirs. Pattern E strain emm types were associated with the presence of the hypervariable sof gene or the sof-conferred ability to opacify serum, while pattern A-C strains were negative for these features (10). Pattern D strains are also usually sof negative and were associated with a preference for the skin disease reservoir (10). Short emm type-defining regions segregate into distinct sof-negative and sof-positive phylogenetic clusters (11). The recently described emm clustering scheme is based on phylogenetic comparisons of full-length M protein sequences, predicting structural and binding features of full-length M proteins (12). emm cluster types, emm locus patterns, and the presence or absence of sof are similarly predicted through knowledge of the emm type or classical M serotype alone (10, 12–14). Recently, it was shown that emm clusters can predict seasonality in that pattern A-C and E clusters display different propensities to cause invasive disease in summer months (15).
We describe here characteristics of iGAS strains obtained from 1 year of population-based surveillance in the United States, including emm types, multilocus sequence typing (MLST)-based genotypes, predicted resistosomes, and some key features associated with virulence and adhesion. Regional strain emergence, incidence of emm and emm-like vaccine candidates, sof genetic and seasonal associations, other virulence features, and even the identification of a successful iGAS lineage of Streptococcus dysgalactiae subsp. equisimilis are presented.
RESULTS
Potential coverage of a combined 30-valent type-specific M and Mrp protein vaccine.Of the 1,454 iGAS isolates in this study, 1,290 (88.7%) had 1 of 27 emm types targeted by an experimental 30-valent type-specific vaccine shown to elicit >50% killing of strains expressing individual vaccine M types (7, 8) (Fig. 1A; Table 1). In addition, a single query identifying the three classes of the M-like Mrp vaccine candidate (9) was positive for 140 (85.4%) of the 164 iGAS isolates of emm types not covered by the 30-valent vaccine. A majority of the strains (778/1,454 [53.5%]) would theoretically be targeted by both components of a potential combination vaccine (Fig. 1B; Table 1), while 98.3% (1,430/1,454) of the strains may be covered by at least one of the two vaccine components.
(A) emm type distribution data for the 1,454 iGAS isolates recovered through 2015 ABCs and their potential coverage by two experimental vaccines. Twenty-five emm types are shown in green as part of the 30 included in an experimental 30-valent M type-specific vaccine (7). Thirty-two types (orange) are non-30-valent vaccine emm types but are associated with one of three classes of the emm-like mrp gene included in an experimental trivalent vaccine (9). Eighteen emm types (blue) are included in the 30-valent vaccine and also associated with one of the mrp gene classes. Other*, 14 isolates of 13 emm types (not shown) were mrp positive. Other**, six isolates of four emm types (60, 227, 234, and 238; red) are not included in the 30-valent vaccine and also lack an mrp gene. Three additional emm types (not shown) were from 12 ST128 group A S. dysgalactiae subsp. equisimilis isolates (red). (B) Summary of emm type distribution data from 1,454 iGAS recovered in 2015 shown in four mutually exclusive categories.
Cumulative bioinformatics pipeline data for 1,454 iGAS isolates recovered in 2015a
Resistance.Of the 1,454 isolates used in this study, 328 (22.6%) were detected with one or more accessory genes or chromosomal signatures associated with resistance or decreased susceptibility to antimicrobials (Table 2). Only 19 isolates (1.3%) with discrepant phenotypes were observed. Among these 19 isolates, there were 14 instances of undetected resistance that included 12 tetracycline-resistant group A S. dysgalactiae subsp. equisimilis (see Table S1 in the supplemental material for accession numbers) and 2 erythromycin-nonsusceptible (MICs of 0.5 to 1 µg/ml) isolates. Five isolates associated with false predictions of resistance are described in Table 2 footnote b.
TABLE S1
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Resistance features of 1,454 iGAS recovered in 2015a
Although β-lactam antibiotic resistance in S. pyogenes has not been reported, resistance to this class of antibiotics in this species would have a profoundly negative public health impact.
For this reason, we incorporated a WGS-based monitoring system. Determination of PBP2x transpeptidase sequence types (STs) of group B streptococci (GBS) and pneumococci is a very sensitive mechanism for detecting potential first-step mutations conferring decreased susceptibility or intermediate resistance to penicillin and other β-lactam antibiotics (16–18). We used this same approach with the corresponding PBP2x region from S. pyogenes. The MICs of the six β-lactams for all of the isolates were below the values previously flagged for GBS, which corresponded to 1 of 16 PBP2x types (Table S1). Of the 1,454 isolates, 1,107 (76.1%) shared type PBP2x-1, which served as the reference sequence.
Macrolide resistance was predicted in 210 (14.4%) of the isolates and in most cases (184/210, 87.6%) was associated with erm methylase genes (ermT, ermB, or ermTR) and either inducible or constitutive coresistance to clindamycin. Most isolates positive for ermT or ermTR (>80%) were inducibly clindamycin resistant, while most ermB-positive isolates (>90%) were constitutively resistant (data not shown). The most frequently occurring macrolide resistance determinant was ermT, because of its association with emm92 as previously described (19) and possibly more recent associations with the emm4/ST39 and emm77/ST399 lineages (Table 1). As previously noted in GBS, the ermT determinant was detected at an approximately 10-fold greater read depth than other markers, consistent with its presence on a multicopy plasmid (17). Analysis of high-quality assemblies from two randomly selected ermT-positive strains (one type emm4 [isolate 20155033) and one type emm92 [isolate 20154014]) revealed that the single ermT-positive contig in each strain compared to previously described (17) plasmid pRW35 (4,968 bp) had 100% coverage, >99% identity, and a length of ~4,970 bp.
One strain contained the putative efflux determinant lsaC and also ermB. In GBS, we found that this combination confers decreased susceptibility to quinupristin-dalfopristin, presumably because of streptogramin A resistance conferred by lsaC and streptogramin B resistance conferred by ermB (17). In this iGAS isolate, the MIC of quinupristin-dalfopristin was somewhat higher (1 µg/ml) than the average (~0.3 µg/ml) but still below the MIC (2 µg/ml) indicative of intermediate resistance (20).
Approximately 95% of the GAS isolates tested had a ciprofloxacin MIC of ≤2 µg/ml and a levofloxacin MIC of ≤1 µg/ml. For this reason, we considered the 2-µg/ml MIC of both antibiotics an indicator of reduced susceptibility to fluoroquinolones. We found nine substitutions in ParC and/or GyrA that were highly associated with reduced susceptibility or nonsusceptibility to fluoroquinolones (Tables 2 and 3). Current guidelines do not assign cutoff values for ciprofloxacin, although we found that isolates with ciprofloxacin MICs of ≥4 µg/ml contained uncommon substitutions in ParC. All instances of intermediate levofloxacin resistance (MIC of 4 µg/ml) contained one of these three substitutions in ParC. All three levofloxacin-resistant strains contained the GyrA-S81F substitution, two of which additionally contained the ParC S79F or S79Y substitution (Table 3).
Summary of fluoroquinolone MICs for 48 isolates containing ParC and/or GyrA substitutions
Four instances of chloramphenicol resistance, corresponding to the presence of a cat gene, were found (Table 2). We found no resistance to rifampin, gentamicin, or vancomycin, consistent with finding no previously described rpoB substitutions associated with rifampin resistance (17, 21, 22) and no genes for gentamicin or vancomycin resistance (23, 24). All isolates were also susceptible to daptomycin and linezolid.
S. pyogenes strain diversity as assessed by MLST and emm type.The 1,454 isolates comprised 70 different MLST complexes (MCs) and 64 different emm types (Table 1). On the basis of a looser criterion than MCs (described in Materials and Methods), there were 15 clonal groups and 28 singletons (Fig. 2). There were generally strong associations of genetic features with the MCs depicted in Table 1.
eBURST analysis (53) of 1,454 iGAS isolates recovered in 2015. The 15 eBURST groups consist of isolates that share four or more MLST alleles with at least one other member of the set. Twenty-eight STs are singletons representing as few as 1 or as many as 35 isolates. Singleton STs do not share four or more MLST alleles with other GAS isolates included in this study. Of the 883 sof-positive isolates, only 3 are closely genetically associated (≥4 identical MLST alleles) with any sof-negative strains. These three exceptions, shown in group 3 (red font), include two emm82 serotype-switching variant isolates of ST36 (explained in the text and in Fig. 4) and one emm113/ST148 isolate. STs shaded yellow, orange, pink, and red indicate macrolide resistance proportions of 10 to 25, 26 to 50, 51 to 75, and 76 to 100%, respectively. A small asterisk is indicated within the main ST36 circle to represent the single emm12 deletion strain.
emm types.Type emm1 was the most frequently occurring type overall (Table 1), accounting for 21.7% of the isolates, and was among the three most common emm types at each ABCs site (https://www.cdc.gov/abcs/reports-findings/survreports/gas15.html). Types emm89 and emm12 were also frequent (12.9 and 9.3% of the isolates tested, respectively) and widely distributed. With the exception of emm82, the five most frequently occurring emm types in 2015 (emm1, emm89, emm12, emm82, and emm28) have each been among the five most frequently occurring types each year since 2012 (emm1 was the most frequent each year).
Recent regional emergence of emm59.Regional instances of emm type emergence were apparent. For example, emm59 was predominant in New Mexico in 2015 but infrequent elsewhere (https://www.cdc.gov/abcs/reports-findings/survreports/gas15.html). emm59 was also predominant in New Mexico in 2014 (https://www.cdc.gov/abcs/reports-findings/survreports/gas14.html). In 2015, an emm59 clone had also emerged in Arizona, a non-ABCs state (25). We were unable to directly compare the phylogeny of New Mexico primary and secondary emm59 clades with recent Arizona emm59 isolates (25), since WGS data from these 18 isolates were not provided. However, two genomic sequences closely related to the Arizona cluster were available from two New Mexico isolates recovered in 2011 and 2012 (SRR1574573 and SRR1574608; a single Colorado outlier from this study is also shown) and included in our phylogenetic analysis of ABCs 2015 type emm59 isolates (Fig. 3). The close similarity of the two recently described New Mexico isolates to the 2015 clade 1 isolates (clade 1 is ST864, while clade 2 is ST172 and speC positive, with the exception of 20154161, which was used to root the tree) indicates very close relatedness (compare Fig. 3 with Fig. 2 in reference 25; however, these two Arizona isolates differed from the ABCs 2015 New Mexico clade 1 isolates by the presence of speC. The five New Mexico clade 2 isolates from the 2015 ABCs also form a very tight phylogenetic cluster.
Phylogenetic relationships of emm59 ABCs isolates recovered in 2015. All but 5 of these 53 isolates were recovered in New Mexico. The values shown are SNP differences. Three additional isolates (recovered in 2011 and 2012, two in New Mexico and one in Colorado) were obtained from the GenBank database from reference 25. Clade 1 is defined by ST864, while clade 2 is defined by ST172. All isolates above or below the arrows share the features indicated, with three exceptions (20154161, 20152762, and 20152741).
Potential emm type switching.There were only four instances of a single S. pyogenes ST (ST28, ST433, ST12, and ST36) associated with more than a single emm type, only one of which, emm82 in the common emm12 genotype ST36, appeared representative of an emm gene switching event (described below). The predominant global emm1 lineage is ST28. The single emm227/ST28 strain is a deletion derivative of emm1 (predicted to lack processed M protein residues 17 to 24). Similarly, the three emm151 isolates were ST433, as expected of a derivative of M49/ST433 with mature M protein residues 3 to 13 deleted. The relationship between the two unrelated emm types, emm29 (one isolate) and emm91 (three isolates), in ST12 is not straightforward. ST12 was first documented in an emm91.0 strain recovered in 1943 and subsequently documented among five emm29 isolates recovered from 1997 to 2004 in the United States and Germany and three emm91 isolates recovered from 1990 to 1994 in Chile and Australia.
Unusual emm switching and emm deletion in the ST36 lineage.There were two emm82 isolates (20154051 [emm82.0] and 201624915 [emm82.7]) that were ST36, the MLST lineage of the global emm12 strain. Both emm82/ST36 strains were positive for sof and mrp and negative for the sic determinant. Comparison of the approximately 22-kb sof-emm region of the two emm82/ST36 strains with those of the putative parental emm12/ST36 (recipient) and donor (emm82/ST334) strains predicted their descent from the same progenitor (Fig. 4). Both strains revealed the same likely crossover point in the isp gene 5.8 kb upstream of emm82 at (base 1682938 relative to the emm12/ST36 reference genome [accession no. CP000259]). The downstream crossover point for the switching event appears to have been in a downstream histidine triad gene approximately 8.4 kbp downstream of the emm82 gene (base 1665610 relative to the emm12/ST36 reference genome). Following this double-crossover event, we predict that there was homologous excision of the enn82 gene that normally lies downstream of emm82, facilitated by the near sequence identity between the emm82 and enn82 3′ regions (see GenBank accession number CP007561 from an emm82/ST334 strain for comparison). All of the emm12/ST36 strains analyzed to date contain a conserved single base deletion predicted to result in a truncated nonfunctional 746-residue protein lacking its fibronectin-binding repeats (26). Both emm82/ST36 strains shared the same highly conserved sof gene shared by emm12 strains (14, 26); however, their sof12 allele no longer contained the inactivating single base deletion and was predicted instead to encode a full-length 1,019-residue Sof12 protein inclusive of fibronectin-binding repeats and the C-terminal membrane anchor. Consistent with this observation, the two emm82/sof12 recombinant progeny strains were found to be serum opacity factor positive.
Gene replacement event indicative of replacement of the emm12-sic region with the mrp82-emm82 region. The parental donor (orange) and recipient (blue) strain chromosomal regions are indicated on the basis of observed sequence homologies. The homologies observed suggest that the gene replacement event involved a large isp–His triad-encoding gene region, followed by a deletion event facilitated by homologous sequences in the tandem emm82 and enn82 3′ regions. The inactivating frameshift in sof12 (indicated by sof12*) was removed by a single base insertion (indicated by sof12+). Additional recombination events involving unknown donors are indicated by gray zones in the isp and scpA downstream region.
One isolate (20160179), also of major emm12 lineage ST36, was not emm typeable. Surprisingly, this strain lacked both the emm12 and sic genes, corresponding to a precise deletion of a 4,892-bp region between mga and scpA (relative to the recipient strain shown in Fig. 4, bases 1673889 to 1678781 relative to the emm12/ST36 reference genome [GenBank accession number CP000259]).
emm types present among multiple genetic backgrounds in S. pyogenes.While only one instance of emm gene switching between different lineages was clear (two emm82/ST36 strains described above), there were 10 additional instances of the same emm type distributed between two and three completely unrelated MLSTs (Table 1). There is insufficient information to track the origins of these different emm/ST combinations. An example that shows the genetic diversity in certain emm types is type emm77, which is shared among three unrelated multi-isolate MLST-based lineages. The differences between these distinct MLST-based lineages are reflected in their different genetic features (Table 1). The common emm77.0/ST63 lineage (35 isolates found in nine ABCs states) has long been documented in Germany, Poland, and the United States (https://pubmlst.org/spyogenes/). The emm77.0/ST399 lineage (12 isolates, includes one single-locus variant, ST904) was also found in 2015 in nine ABCs states, with the only known documentation of ST399 being a single emm77 isolate recovered in Thailand (https://pubmlst.org/spyogenes/). Finally, the sole association of emm77/ST133, shared by seven isolates recovered in Tennessee (Table S1), is actually with the original tee (T) type 5 Lancefield M27 reference strain recovered more than 60 years ago. Briefly, this type was subsequently redesignated emm77 by the CDC emm database curator in the 1990s (B. Beall, unpublished data) because of its emm sequence identity to the prevalent emm77/ST63 lineage and its sequence dissimilarity from the original Griffiths M27 strain, which is the current established type emm27 (13, 27).
Surface protein determinants.In general, MCs were strongly associated with specific emm and T types and the presence or absence of additional surface protein genes. These included fibronectin-binding repeat motif-containing genes (sof, fbaA, prtF2, sfb1) (48) and the emm-like mrp and enn virulence genes that flank emm in many strains (9). These two emm-like genes show much less interstrain variation than emm genes and are also virulence factors. All sof-negative S. pyogenes strains were associated with previously described emm cluster A-C or D types (or patterns) (10), with the exception of a single emm15 (pattern E/cluster E3) isolate. Type emm15 is the only cluster/pattern E emm type in this study that is also historically associated with the serum opacification-negative phenotype (13).
GAS pili are important virulence factors that function in epithelial adhesion (reviewed in reference 3. Most (1,388/1,454, 95.5%) isolates had 1 of 21 different pilus (tee) types, corresponding to different pilus backbone protein subunit genes (29) and classical T agglutination types (13). In individual MCs, most tee types (based on 120- to 240-bp gene segment queries) were generally predictive of highly conserved 950- to 1,800-bp open reading frames that shared >95% sequence identity within the type (data not shown). The single exception included the tee gene of an emm106/ST338 isolate that shared only 82.5% sequence identity with the previously described reference tee3 gene. As previously described (29), tee genes were highly diverse but exhibited differing regions of inter-tee gene homology and all contained signal sequence motifs situated near the 5′ ends with SrtB sortase family wall attachment motifs near the 3′ ends.
The R28 antigen gene, which is a close homolog of a group B streptococcal adhesin (30), was found in four strain complexes (emm28/ST52, emm77/ST63, emm2/ST55, emm68/ST894) that included 157 (10.8%) of the 1,454 isolates.
Extracellular virulence determinants.The sda1 (virulence-associated DNase) (49) determinant was found in 450 (30.9%) of the 1,454 isolates in seven strain complexes (emm1/ST28, emm12/ST36, emm77/ST63, emm101/ST182, emm81/ST624, emm27/ST308, and emm24/ST70).
A query for the streptococcal inhibitor of complement (sic) gene (31, 32) employed a short sequence targeting the derivatives found in the emm region of the prevalent emm1/MC28 and emm12/MC36 lineages, which were primarily positive for the query (309/316 for emm1, 114/135 for emm12). In addition, an emm227/ST28 deletion derivative of emm1 (Table 1) and two emm228 isolates (both double-locus variants of ST28) were sic positive.
Exotoxin gene patterns were generally highly associated with GAS emm/ST-defined lineages. For example, this was evident in emm1/MC28, where 306/316 isolates were positive for speA, speG, speJ, and smeZ. Three emm1/ST28 isolates were positive for additional exotoxin genes besides these four. One emm1/ST28 isolate was additionally positive for speK, and another was additionally positive for speC. Finally, emm1/ST28 isolate 20156011 was additionally positive for speC and ssa. Examination of the genome revealed that spd1 (DNase gene), speC, and ssa were tandemly situated in a prophage sequence, as in the highly related ϕHKU488.vir prophage from an antimicrobial-resistant emm1 strain causing scarlet fever in Hong Kong (33). Unlike the Hong Kong emm1 strain, ABCs strain 20156011 was susceptible to antibiotics and lacked ermB and tetM determinants.
Capsular biosynthetic locus.The emm89/ST101, emm4/ST39, and emm22/ST46 lineages lacked the hasA hyaluronic acid synthetase determinant, as previously described (5, 35, 55), accounting for 233 (16.2%) of the 1,442 S. pyogenes isolates. Single hasA-negative emm1, emm11, and emm12 isolates were also in the isolate set. The 12 group A S. dysgalactiae subsp. equisimilis isolates (described below) also lacked a sequence similar to the hasA query.
nga operon markers previously associated with strain emergence.The variant 3 nga promoter (Pnga3) has been associated with increased transcription of the nga operon (4). Pnga3 is a promoter sequence associated with increased transcriptional activity relative to the previously described less active emm89 clade 1 and 2 promoters situated upstream of the genes (nga and slo) encoding the extracellular toxins NADase and streptolysin O (5). The presence of Pnga3 was invariably linked to the putatively active NADase 330G query among the study isolates (4). Consistent with Pnga3 and NADase 330G being associated with transmissibility and virulence, these two features were evident in the three most frequently occurring strain complexes. These three strain complexes (emm1/MC28, emm89/MC101, and emm12/MC36) accounted for 44% (636 isolates) of the entire iGAS sample. Overall, about 56% of the isolates (808/1,454) contained these two nga operon markers. NADase 330G was found in 326 isolates unlinked to Pnga3, with 320 isolates containing the inactive NADase (G330D substitution) and also lacking the more active Pnga3 promoter (Table 1).
emm89 emergence.Of the 185 emm89 isolates recovered in 2015, 178 were acapsular (hasA negative) and positive for the previously described clade 3 promoter Pnga3 (5). These data are consistent with recent studies that also employed ABCs emm89 isolates (recovered from 1995 to 2013), as well as emm89 isolates recovered in Finland, Iceland, and the United Kingdom (4, 5, 35). Recent studies correlated the acquisition of Pnga3 and the acapsular genotype with the increase in infections caused by emm89 GAS recovered through ABCs in the mid-2000s (4, 5).
It is interesting that the only emm89 isolates found in ABCs from 1995 to 1999 were serologically T type 11 (Fig. 5). WGS obtained from two T11/emm89 isolates recovered in 1995 and 2000 revealed that both were single-locus variants of emm89/ST101 (emm89/ST407), were hasA+, and contained Pnga1 (5) (Fig. 5). As recently described (36), we found that the emergent clade 3 emm89 also acquired a tee gene distinct from that of clade 1 strains. The T89 genetic marker, which we find to be associated with the serological T13 type, was first detected in 2000 in ABCs emm89 isolates (Fig. 5), associated with less active Pnga2 (5), and was also hasA+ (Fig. 5). In our 2015 isolate set, we found that only 6 of the 185 emm89 isolates were of the emm89/ST407/T11(tee11) lineage, were hasA+, and contained Pnga1 (Fig. 5; Table 1), corresponding to previously described clade 1 (5).
Emergence of emm89 in ABCs. Notes inserted into the graph describe the numbers of isolates sequenced and relevant pipeline results. The data are consistent with previous observations (4, 5) associating the emergence of emm89 with the appearance of the acapsular clade 3 strain containing the upregulated nga operon promoter Pnga3 and a different tee gene (36). According to our data, the clade 3 strain is T serotype 13 (corresponding to the tee89 gene) and associated with the decline of the T serotype 11 (corresponding to the tee11 gene) clade 1 emm89 strain (5) that expressed capsule and contained the less active nga promoter Pnga1. Although only two T11 (tee11) isolates (recovered in 1995 and 2000) and one T13 (tee89) isolate (recovered in 2000) were sequenced in this study prior to 2015, where all 185 isolates were sequenced, our data are entirely consistent with the previous reports and publicly available WGS data that include a large number (870) of these isolates recovered from 1995 to 2013 (4, 5, 36). Compare Fig. 1 in reference 36, which describes the international emergence of clade 3 emm89 to the data presented here.
sof gene relationships with different emm clusters/patterns, emm-like genes, and MCs.The majority (884/1,454, 60.8%) of the isolates tested corresponded to pattern E strains on the basis of 3′ sequences of emm family genes at the mga locus (10). Nearly all of the pattern E strains contained emm genes of different E clusters according to the recently described clustering scheme (12). Only one pattern E emm15 isolate (emm cluster E3) was sof negative (Table 1), consistent with previous emm15 associations (13). There were 39 emm types corresponding to E clusters or patterns. Nearly all of the other strains contained 1 of 18 A-C or D cluster emm types (equating to patterns C and D), all of which were sof negative or historically serum opacity factor negative. Unlike pattern D and E strains, pattern A-C strains were negative for the emm-like genes mrp and enn, with the sole exceptions of the two type emm18 isolates (Table 1).
Serum opacity factor is an important hypervariable virulence factor expressed by a large percentage of GAS strains (37). Positivity for the sof gene fragment query was predictable by previously established associations of sof genes with emm types or of previously established associations of the opacity factor phenotype with M serotypes and/or emm types (13, 14). The single exception was that type emm12 strains were positive for the sof query; however, emm12 strains are invariably opacity factor negative according to decades of published data (13). The emm12/ST36 lineage contains a conserved frameshift mutation in sof12 that prematurely truncates the protein (found in all 20 randomly selected emm12/ST36 strains in this study and the CDC M12 reference strain isolated >60 years ago). Otherwise, the presence or absence of sof completely conformed to previously observed associations (13, 14).
A loose definition of MLST groups allowing any ST related by four or more alleles to any other in the group divided the 1,442 S. pyogenes isolates into 15 groups of 2 to 24 STs (10 to 346 isolates each) and 28 “singleton” STs (1 to 35 isolates each) not related to other STs by four or more alleles (Fig. 2). Also indicated are the 12 ST128 (S. dysgalactiae MLST scheme) S. dysgalactiae subsp. equisimilis strains included in the 2015 ABCs. It is striking that 14 of the 15 groups consisted solely of either sof-positive or sof-negative strains, with the exception of group 4, which consisted of 136 sof-negative isolates and 3 sof-positive isolates. These three sof-positive isolates consisted of the two unusual emm switching emm82/ST36 strains (described above and in Fig. 4) and the single emm113/ST148 isolate. The single ST148 isolate recorded at https://pubmlst.org/spyogenes/ was an emm113 strain recovered in New Zealand in 1997.
Contrasting seasonality of infections shown by sof-negative and sof-positive strains.Recently, it was observed that infections due to emm AC cluster (or pattern A-C) strains peaked in the winter (first quarter, from January to March), while E cluster (or pattern E) strain cases were disproportionally represented in the summer (third quarter, from July to September) (15). Figure 6 shows that the seasonal relationship of emm clustering (or emm locus patterns) is reflected by the presence or absence of an active sof determinant. Among the S. pyogenes isolates studied, sof-negative isolates accounted for only 38.8% (559/1442) of the total yet accounted for 45.5% (199/437) of the S. pyogenes cases in quarter 1 (P < 0.0005). In quarter 3, sof-negative isolates accounted for only 29.2% (78/267) of the cases (P < 0.0005). This marked fluctuation of sof-negative iGAS incidence between quarters 1 and 3 contrasts with the relatively stable incidence of sof-positive iGAS in these periods (Fig. 6).
Different seasonalities of infections with sof-positive and sof-negative S. pyogenes strains. For the 1,442 isolates obtained in 2015, the data are based on the presence or absence of the sof gene. As shown, there is a disproportionate number of sof-positive isolates versus sof-negative isolates that is most evident in quarters 3. In 2012 to 2014, the presence or absence of sof was based solely on historic associations with the emm type or M serotype (13, 14). Because of differences in isolate collection from 2012 to 2015, the numbers of isolates shown (solid line) do not completely correlate with differences in disease incidence between different years. Rates per 100,000 population and surveillance populations are shown for each year.
As with emm clusters and emm patterns, the presence or absence of sof is nearly always predicted by identification of the emm type (13, 14). We used emm typing to determine that emm type-based predictions of sof presence/absence resulted in the same seasonality pattern in 2012 to 2014 that was seen in 2015 on the basis of the actual presence of an intact sof gene (Fig. 6).
Invasive group A S. dysgalactiae subsp. equisimilis.The ABCs program is based on the identification of iGAS isolates without identification to the species level. Almost all of the isolates (1,442/1,454; 99.2%), including 12 S. dysgalactiae subsp. equisimilis isolates, reported to ABCs in 2015 were gacI positive, which is predictive of group A carbohydrate production (28). A single gacI-negative isolate of emm type stG643.0 was subsequently found to be serogroup G S. dysgalactiae subsp. equisimilis and was removed from the study. Twelve S. pyogenes isolates (0.8%) were negative for the gacI query; however, these were found to be serogroup A.
All 12 group A S. dysgalactiae subsp. equisimilis isolates identified through phylogenetic analysis (38) were of ST128 according to the S. dysgalactiae MLST scheme at https://pubmlst.org/sdysgalactiae/. The recovery of these 12 ST128 isolates of three different emm types from four different states suggests that this is a long-standing group A lineage of this species. The single group G 2015 ABCs isolate of this species was found to be ST48 (S. dysgalactiae MLST scheme). Analysis of the gac (group A carbohydrate) operon from these strains revealed a hybrid structure with an upstream crossover point in gacE and a downstream crossover point in the second open reading frame immediately downstream of the gacA-gacL operon (data not shown). This approximately 11,500-bp recombinational fragment apparently originating from S. pyogenes corresponds to coordinates 609389 to 620916 of the S. pyogenes sequence with GenBank accession number CP000017. This fragment encompasses the gacI, gacJ, and gacK genes, which shared 99.4 to 99.7% sequence identity with counterparts in S. pyogenes. These three genes were recently shown to be essential for expression of the immunodominant N-acetylglucosamine side chain of the Lancefield group A carbohydrate (28).
DISCUSSION
While emm typing and antimicrobial resistance phenotyping have served as the basis of ABCs iGAS strain surveillance for the past 2 decades (2, 43, 44), the addition of WGS-based strain characterization to this population-based surveillance system encompassing nearly 34 million individuals provides much more insight into underlying strain features and strain emergence. We found in invasive GBS that PBP2x typing was actually more reliable and sensitive for detecting first-step mutations leading to β-lactam nonsusceptibility (17), and having this system in place for iGAS allows us greater vigilance for this potential threat. We now see that ermT, discovered in GAS only in the last decade (19), actually accounts for the major percentage of emerging GAS resistance to macrolides and lincosamides. Through our current WGS pipeline data, we have several additional parameters to evaluate in association with disease manifestations, virulence, and as vaccine components. Nearly all (~99%) of the study isolates would be covered by a combination M-Mrp vaccine (7–9), with more than half of this isolate set putatively targeted by both vaccine components. This is an important observation, since recent work indicated that the combination vaccine would provide more effective opsonization than either vaccine alone (9). We were able to quantitate MLST-defined diversity and to determine the extent of emm type switching in the same manner that pneumococcal strains have been assessed for capsular serotype switching in the past 15 to 20 years. From the results shown here, it appears that emm type switching is rare and might not be a significant immune escape mechanism should an M protein-based vaccine be implemented. In the entire sample set, we detected only one example of a past switching event, represented in two isolates, where the emm12 gene in the ST36 genetic background was replaced with the emm82 gene. In addition, we detected only one emm-negative iGAS isolate (also in the ST36 background), consistent with the M protein’s historical role as an essential virulence factor. Nonetheless, the detection of these three unusual invasive isolates does present the possibility that such variant strains could emerge as successful pathogens in the presence of selection exerted by an M protein-based vaccine.
Increased documentation of GAS strain parameters may hasten the understanding of features that affect pathogenic potential. In particular, the association of the three major iGAS lineages (emm1/ST28, emm89/ST101, emm12/ST36) with an upregulated nga operon is compelling, especially when this feature directly correlated with the marked emergence of emm89 in ABCs isolates over the past decade (4, 5, 36). Individual strain parameters may provide greater understanding of GAS tissue tropism and disease manifestations. For example, emm28 has been shown to be significantly associated with postpartum iGAS infections (39, 40). Vaginal tissue tropism could be influenced by expression of the R28 determinant, detected primarily in the emm28 isolates in this study. This possibility is further suggested by the existence of a close R28 homolog in group B streptococci that commonly colonize the vaginal epithelium (30). The acquisition of the tee89 gene in emergent clade 3 emm89 may have conferred new functional adherence or immune evasion properties (3, 36, 41). The recent increased superantigen complement in emm1 subclones described in China is reason for increased awareness of the enhanced virulence potential imposed by already impactful iGAS strains, wherein emm12 strains facilitated the horizontal transfer of scarlet fever-associated mobile elements carrying speC and ssa to the emm1/ST28 lineage (33). In our strain set, we observed a single emm1 isolate (20156011) that was positive for speC and ssa in addition to the usual emm1/ST28 superantigen complement (speA, speG, speJ, speZ) that was situated on a prophage highly related to previously described ϕHKU488.vir (33).
It is very interesting that in the two recombinant (emm type switching) emm82/ST36 strains, the normally inactive sof12 gene reverted to an active allele upon the insertion of a single nucleotide. This observation is compatible with previously established emm type associations with sof (13, 14). It is plausible that some biologically defined barrier prevents the presence of an active sof gene in association with cluster A-C and D emm types. The reverse association also seems to be indicated, in that the combination of an active sof gene and most cluster E emm types might be essential for strain success. The association of the emm type with the presence or absence of the multifunctional sof virulence gene (37) appears to have an underlying clonal basis, since MLST divides isolate sets into defined sof-negative and sof-positive groups. The observed differences in seasonality between sof-negative and sof-positive strains could be based on the presence or absence of sof or could be based on other, unknown, clonal features.
The appearance of a specific group A lineage of the diverse subspecies S. dysgalactiae subsp. equisimilis widely spread among different ABCs sites is indirectly indicative of the considerable disease burden attributable to this subspecies (42), which is almost always associated with group C or G carbohydrates (38). The acquisition of the ability to express the group A antigen, itself a virulence factor (28), is reason for continued close monitoring of this iGAS subspecies. We provide evidence here that the ST128 iGAS lineage arose through a single interspecies gene replacement event. The association of group A S. dysgalactiae subsp. equisimilis ST128 with three distinct emm types is indicative of a successful longstanding lineage. We have previously shown that the association of multiple emm types in a single ST is not unusual in S. dysgalactiae subsp. equisimilis (38), although our data indicate that it is extremely rare in S. pyogenes.
An important aspect of WGS-based strain surveillance in ABCs is the ability to deduce close temporal and geographic relatedness between GAS isolates. The predominance of emm59 in New Mexico in 2015 shows the potential of the use of WGS to elucidate disease transmission patterns and therefore to potentially guide efforts to control disease. We are working toward faster identification of such clusters in ABCs and trying to identify potential outbreaks for which public health intervention may be effective.
To summarize, through WGS, we have examined several aspects of iGAS strains that we were previously unable to explore in a systematic population-based manner. We provide our basic WGS genetic data in association with the genomic accession data from a full year (2015) of ABCs isolates, along with lab identifiers, in Table S1. These isolates and some accompanying epidemiological data can be acquired at https://www.cdc.gov/abcs/pathogens/isolatebank/overview.html for further investigation.
MATERIALS AND METHODS
Isolates.ABCs conducts active laboratory and population-based surveillance for iGAS infections (including necrotizing fasciitis, streptococcal toxic shock syndrome, and other infections associated with GAS isolated from a normally sterile site) in geographic areas of 10 states, representing 33.7 million persons. The 1,454 available isolates, representing 89.6% of the cases that occurred in 2015, were subjected to WGS and antimicrobial susceptibility testing. Key features of ABCs iGAS surveillance from 1997 to 2015 have been described previously (2, 43, 44; https://www.cdc.gov/abcs/reports-findings/surv-reports.html).
Whole-genome sequencing.GAS chromosomal DNA preparation, library construction, and WGS generation for the 1,454 isolates were performed as previously described (16).
Conventional MIC determinations.Isolates were subjected to broth dilution testing (BDT) for determination of MICs with the panel previously described for GBS that included a well containing both erythromycin and clindamycin to detect inducible clindamycin resistance (17). Discordant results where WGS-based predictions differed from BDT results by ≥2 dilutions (≥4-fold MIC differences) were retested by E test as described by the manufacturer (BioMérieux) or by D test (20).
Serum opacity factor determination.Serum opacity factor determination was performed with bacterial supernatants from specific isolates as previously described (45).
WGS GAS typing pipeline.Bioinformatics methods are described and updated at https://github.com/BenJamesMetcalf. emm subtypes were obtained on the basis of a database of defined 180-bp sequences maintained at the CDC (ftp://ftp.cdc.gov/pub/infectious_diseases/biotech/tsemm/). This subtyping scheme is based on a sequence that consists of 10 codons corresponding to the C-terminal end of the M protein signal sequence and 50 codons corresponding to the N terminus of the mature M protein (46). The WGS emm typing scheme employs de novo assembly and queries sequences closely linked to 21-bp emm typing primer 1 (27) situated adjacent to the emm type-specific region.
A PBP2x transpeptidase amino acid sequence type was generated for each isolate as described for GBS PBP2x for detection of first-step mutations leading to β-lactam resistance (17). Additionally, the ARG-ANNOT and ResFinder databases were incorporated (23, 24). Sequence targets for detection of the presence/absence of 21 T antigen backbone (tee) genes (29), the gacI glycosyl transferase specific for the group A antigen (28), the hyaluronic acid synthetic locus hasA (47), emm-like genes that flank emm (9), four different fibronectin-binding domain repeat proteins (48), the R28 surface antigen (30), the sda1-encoded DNase (49), sequence polymorphisms associated with the ngo operon (4, 5), two conserved rocA null mutations (50, 51), 12 exotoxin genes (speA to speC, speG to speM, ssa, smeZ) (52), and the streptococcal inhibitor of complement (31, 32) were obtained through the references indicated.
MLST.MLST relied upon SRST2 and the database at http://pubmlst.org/spyogenes/.
MLST-based CCs and groups.MCs were defined as isolates sharing at least five alleles with the reference ST, which represented the major ST found in an emm type. An eBurst (53) group was defined as an ST set where each member shared at least four alleles with one or more other members of the set.
Phylogenetic analysis.kSNP3.0 analysis was performed as previously described (54).
Statistical analyses.A chi-square test was performed to evaluate differences in seasonality between sof-negative and sof-positive groups.
Accession number(s).Accession numbers for the 1,454 fastq files used in this work are provided in Table S1, along with lab identifiers, WGS-generated genetic data, and quality metrics.
ACKNOWLEDGMENTS
We are indebted to all of the hospitals and laboratories participating in the Active Bacterial Core Surveillance component of the Emerging Infections Programs network, a collaboration of the CDC, state health departments, and universities. We are grateful to the Minnesota Department of Public Health laboratory for pneumococcal serotyping and susceptibility testing of all of the isolates recovered in Minnesota. We thank the following individuals from the following programs and institutions for their contributions to the establishment and maintenance of the ABCs system: California Emerging Infections Program, A. Reingold, S. Brooks, and H. Randel; Colorado Emerging Infections Program, L. Miller, B. White, D. Aragon, M. Barnes, and J. Sadlowski; Connecticut Emerging Infections Program, S. Petit, M. Cartter, C. Marquez, and M. Wilson; Georgia Emerging Infections Program, M. Farley, S. Thomas, A. Tunali, and W. Baughman; Maryland Emerging Infections Program, L. Harrison, J. Benton, T. Carter, R. Hollick, K. Holmes, and A. Riner; Minnesota Emerging Infections Program, A. Glennon, C. Holtzman, K. Como-Sabetti, R. Danila, and K. MacInnes; New Mexico Emerging Infections Program, K. Scherzinger, K. Angeles, J. Bareta, L. Butler, S. Khanlian, R. Mansmann, and M. Nichols; New York Emerging Infections Program, N. Bennett, S. Zansky, S. Currenti, and S. McGuire; Oregon Emerging Infections Program, A. Thomas, M. Schmidt, J. Thompson, and T. Poissant; Tennessee Emerging Infections Program, W. Schaffner, B. Barnes, K. Leib, K. Dyer, and L. McKnight; CDC, R. Gierke, K.-A. Toews, E. Weston, L. McGlone, and G. Langley.
This study used the S. pyogenes MLST website (http://pubmlst.org/spyogenes/) at the University of Oxford (K. A. Jolley and M. C. J. Maiden, BMC Bioinformatics 11:595, 2010, https://doi.org/10.1186/1471-2105-11-595). The development of this site has been funded by the Wellcome Trust.
Major funding for this work was provided through support from the CDC Advanced Molecular Detection (AMD) initiative and the CDC Emerging Infection Program.
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
FOOTNOTES
- Received 8 August 2017
- Accepted 16 August 2017
- Published 19 September 2017
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.