Evolutionary and Genomic Insights into Clostridioides difficile Sequence Type 11: a Diverse Zoonotic and Antimicrobial-Resistant Lineage of Global One Health Importance.

Historically, Clostridioides difficile (Clostridium difficile) has been associated with life-threatening diarrhea in hospitalized patients. Increasing rates of C. difficile infection (CDI) in the community suggest exposure to C. difficile reservoirs outside the hospital, including animals, the environment, or food. C. difficile sequence type 11 (ST11) is known to infect/colonize livestock worldwide and comprises multiple ribotypes, many of which cause disease in humans, suggesting CDI may be a zoonosis. Using high-resolution genomics, we investigated the evolution and zoonotic potential of ST11 and a new closely related ST258 lineage sourced from diverse origins. We found multiple intra- and interspecies clonal transmission events in all ribotype sublineages. Clones were spread across multiple continents, often without any health care association, indicative of zoonotic/anthroponotic long-range dissemination in the community. ST11 possesses a massive pan-genome and numerous clinically important antimicrobial resistance elements and prophages, which likely contribute to the success of this globally disseminated lineage of One Health importance.

metrics, MLST data, and features for the 207 genomes evaluated in this study are summarized in Data Set S1.
Intra-and interspecies transmission of globally disseminated C. difficile clade 5 clones. The phylogenetic structure and clonal subpopulations of the 200 ST11 and 7 ST258 strains were explored by high-resolution core genome single nucleotide variant (SNV) analysis. WGS reads were mapped to the finished chromosome of C. difficile RT078 strain M120 (ST11 [NC_017174]) to a median depth of 100ϫ. After filtering for indels, repetitive regions, mobile genetic elements, and putative recombination regions, a total of 1,076 high-quality SNVs in the clonal frame were found across the 207-sample data set and used for maximum likelihood (ML) tree building ( Fig. 2A). The SNV-based ML phylogeny revealed 6 distinct evolutionary clusters, which were broadly congruent with RT/ST lineage and toxin gene profile: (i) a large group of 99 strains primarily comprising RT126 and RT078 (designated the RT126/078 cluster), (ii) a group of 33 strains comprising exclusively RT033 and RT288 (the RT033/288 cluster), (iii to v) three distinct groups of strains (44,3, and 21, respectively), belonging predominantly to RT127 (RT127 clusters I to III), and (vi) a divergent group of 7 ST258 strains (the ST258 cluster). Individual phylogenies for each cluster, annotated with metadata, are presented in Data Set S1 at figshare.
In all clusters, there was a general absence of geographic grouping and significant overlap of clinical and nonclinical strains, a finding that supports similar RT-and MLST-based studies that have shaped the hypothesis that strains of ST11 common to humans, animals, and the environment share a recent evolutionary history (1,3,17). To provide ultrahigh resolution of this strain population and to examine this hypothesis further, the SNV phylogeny was investigated for signatures of clonal transmission. Following the standard approach of Eyre et al. (4,18,19), a species-specific molecular clock of 1 to 2 SNVs per genome per year was applied, with a cutoff of 0 to 2 core genome SNVs indicative of a plausible clonal transmission event (Fig. 2B). These thresholds have been shown to be congruent with cutoffs used for core genome MLST (cgMLST), which is based on 2,270 loci and uses a threshold of a difference of Ն7 alleles to define isolates as being unrelated, whereas a difference of Յ6 alleles is used to define isolates as likely to belong to the same clone (20).
Applying the threshold of Eyre et al., 25 clonal groups (CG1 to -25) were identified across the six phylogenetic clusters, defined as groups of two or more strains differing by Յ2 SNVs in their core genome (Table 1). These CGs comprised 25 distinct clones of major RTs 078, 126, 127, 033, and 288 and encompassed 117 isolates of clinical and nonclinical origins (Table 1). Overall, 19/25 CGs (76%) comprised strains isolated from the same host species, indicating intraspecies clonal transmission, while the remaining six CGs (24%) showed evidence of interspecies clonal transmission. Furthermore, many CGs revealed long-range transmission of C. difficile clones across local, national, and international distances (Table 1).
C. difficile ST11 possesses an extensive AMR repertoire. The 207 C. difficile genomes were screened in silico for acquired and intrinsic resistance determinants. Of nonrecombinant, nonrepetitive core genome SNVs in clonal frame. Taxa are colored according to RT lineage: RT033/288 (green; n ϭ 33), RT078 (red; n ϭ 40), RT126 (blue; n ϭ 69), RT127 (orange; n ϭ 54), or other (gray; n ϭ 11). Strain origin is indicated in yellow (clinical, taxa prefixed with "C") and purple (veterinary/environmental, taxa prefixed with "V/E"). Clonal relationships (two or more strains sharing Յ2 core genome SNVs) are indicated in black. The tree is midpoint rooted, and the nodes are supported by 1,000 nonparametric bootstrap replicates (values of Ͼ95 are shown [*]). The overall topology supports PCR ribotype assignment with six major strain clusters identified (the RT126/078 cluster, RT127 clusters I to III, the RT033/288 cluster, and the sequence type 258 [ST258] cluster). (B) Distribution plots showing core genome SNV distances between each strain and the genetically closest strain in each cluster. Vertical lines represent the 2-SNV cutoff for the identification of clonally transmitted strains (18). In "RT127*," the asterisk indicates that clusters I to III are merged.  Fig. 3A, and the MIC range, MIC 50 , MIC 90 , and geometric mean (GM) for all RT lineages are presented in Data Set S1 at figshare. All strains including ST258 were fully susceptible to vancomycin, metronidazole, fidaxomicin, rifaximin, amoxicillin-clavulanate, trimethoprim, and piperacillin-tazobactam ( Fig. 3A; see Data Set S1 at figshare). Overall, 48.1% of strains showed phenotypic resistance to one or more of the agents tetracycline, moxifloxacin, erythromycin, and clindamycin, 25.4% of which (predominantly RT126/078), were multidrug resistant (MDR): i.e., resistant to Ն3 of these agents. Resistance was conferred by a diverse selection of acquired AMR genes (479 individual genes of 22 types across 4 antimicrobial classes) and intrinsic mutations in DNA gyrase subunit genes ( Fig. 4A; see Data Set S1 at figshare). The distribution of AMR genotypes and the key genetic features of major AMR-encoding transposons found in this population are presented in Fig. 4A and Table 2, respectively.
Comparative Genomics of C. difficile Sequence Type 11 ® Genetic diversity in the C-terminus receptor binding domain (RBD) of tcdB was found with 70.6% of tcdB ϩ isolates (n ϭ 144/174) harboring tcdB RBD allele type 1 (21) and the remainder carrying novel allele types 20 and 21, the latter exclusive to ST258 (Fig. 4B). These novel alleles share a recent evolutionary history with type 1 (Fig. 5B) and contain nonsynonymous substitutions that alter the amino acid sequence and biochemistry (Fig. 5C).
All strains harbored wild-type cdtA/B genes, but two variants of cdtR were identified: (i) a 324-bp cdtR allele was found exclusively in RTs 078/126, which, due to a stop codon at position 322, results in a truncated CdtR (from 248 to 108 amino acids [aa]) and (ii) a wild-type 747-bp cdtR allele was found only in non-078/126 strains (see Data Set S1 at figshare). Finally, characterized by diversity in slpA, cwp66, cd2790, cwp2, and secA2 (21), 4 distinct S-layer cassettes were identified (including one novel type) that were broadly congruent with the RT and/or ST lineage (Fig. 4B).
Clade 5 possesses a massive open pan-genome and a diverse population of temperate prophages. To quantify the entire genomic repertoire of the strain population, estimates of the pan-genome, core genome, and accessory genome were generated. The pan-genome was vast, comprising 10,378 genes, while the core and accessory genomes were 2,058 and 8,320 genes, respectively (Fig. 6A). The pangenome showed characteristics of an "open" pan-genome (24). First, the pan-genome  Fig. 6A]). The core genome curve depicts a trend of core genome size contraction with progressive addition of sequential genomes (Fig. 6B), ultimately converging at 2,058 genes at n ϭ 207. Notably, the core genome accounted for just 19.8% of the total gene repertoire and 56.5% of an average ST11/258 genome (range, 49.8 to 60.2).
A total of 221 intact, 73 questionable, and 478 incomplete prophages were identified in the study population. A summary of the distribution and genetic features of intact prophages is shown in Fig. 6C and Data Set S1 (at figshare), respectively. The intact prophages comprised 14 "phage types," ranging between 14.3 and 184.6 kb in length, with an average GC content of 29.9%, comparable to the average GC content Consistent with an open pan-genome, the core genome curve (r 2 ϭ 0.985) converges to 2,058 genes at n ϭ 207, where an average of 16 new strain-specific genes are contributed to the gene pool. Overall, the core genome accounts for just 19.8% of the total gene repertoire. (C) Summary of intact prophage content found in 207 C. difficile strains of ST11 and ST258. More prophages were found in LCT Ϫ RTs versus LCT ϩ RTs, the RT127 lineage versus the RT126 and -078 lineages, and veterinary versus clinical strains (P Ͻ 0.001).
Using large-scale high-resolution WGS, we provide novel insights into the evolution and genetic repertoire of ST11 and its close relative ST258. The global population structure largely mirrored the RT sublineage, with 6 discrete evolutionary clusters comprising highly genetically related strains unconstrained by geographic, temporal, or host species origin. Core genome analysis revealed intra-and interspecies clonal transmission of C. difficile in all the major ST11 sublineages and within the closely related novel clade 5 lineage ST258, which is potentially associated with CA-CDI and patients with hematological/oncological malignancies (26). Clones were spread across geographically distinct health care facilities and farms and indicated reciprocal longrange dissemination and possible zoonotic/anthroponotic transmission locally, nationally, and internationally. Our work supports and extends the findings of Knetsch and colleagues, who, not surprisingly, showed transmission of RT078 between a pig and pig farmer within the confines of a pig-rearing facility (4). In reconstructing the global RT078 population structure, they later revealed an intercontinental transmission network between humans and production animals of RT078 (15).
Our analysis also provided some interesting insights into the overall evolution of ST11/258. First, RTs 126 and 078 did not cluster into distinct subpopulations, suggesting they have coevolved, at least over their core genome, as a single heterogeneous lineage, a finding that supports their frequent reporting as a single RT group (1,36). A similar observation was made for the LCT Ϫ RTs 033 and 288, but the position of RT078/126 at the base of the phylogeny suggests they may be the more ancient of the clade 5 sublineages. Taken together, these phylogenetic analyses reveal a globally disseminated network of clones with the capability and proclivity for reciprocal clonal transmission between production animals and humans with CDI. Moreover, these findings challenge the existing paradigm and long-held conception that CDI is primarily a health-care-associated infection and provide compelling evidence that CDI is a zoonosis. While some human infections in Australia are likely a result of international travel (e.g., clones of RT078), our analyses also indicate a persistent community reservoir with extensive long-range domestic dissemination. Due to the high prevalence of C. difficile in neonatal cattle and pigs, the consumption of contaminated retail meats is a conceivable mechanism for transmission (2,11). However, evidence from studying RT014, the most common RT found in humans and pigs in Australia, suggests a zoonotic transmission chain extending from the farrowing shed to the community (5,12,37). C. difficile can be found in 67% of Australian piglets (12), on 20% of retail root vegetables grown in soil containing animal feces (38), in 59% of public lawns in Western Australia (39), and in 30% of retail compost and manure (unpublished data), with RT014 comprising between 7 and 67% of isolates in these settings. In a manner analogous to human infection, excessive exposure to antimicrobials, particularly to cephalosporins, is driving the expansion of C. difficile in livestock populations worldwide and resulting in spillover of C. difficile into the environment and CDI in the community (37).
AMR can evolve rapidly in C. difficile and is a key factor driving genetic diversity and epidemiological changes in CDI (1). The ST11 lineage has a substantial AMR repertoire, characterized by high levels of phenotypic resistance to tetracycline, moxifloxacin, clindamycin, or erythromycin, predominantly within the RT 126/078 lineages.
TetR strains of C. difficile comprise up to 41% of European clinical isolates (40). We found 30% of ST11 isolates had a TetR phenotype conferred by efflux and ribosomal protective proteins, expressed by Tn6190 (tetM ϩ ) and Tn6164 (tet-44 ϩ ). Tn6190 has a strong affiliation with the RT126/078 lineages, present in 79.7% and 55.0% of strains in this study, respectively, and to date only reported in these RTs (14,16,41). Similarly, Tn6164 has been found in RT078 only and, prior to this study, only within Europe. Corver et al. (14) suggest there may be an association between the presence of this genetic island and enhanced virulence in RT078 strains: CDI-associated mortality was more common in patients infected with C. difficile strains harboring Tn6164 (29% versus 3%) (14). The association of these elements with RTs 078 and 126 could provide a fitness advantage, other than AMR, and be a contributing factor in their success compared to the less widespread LCT Ϫ and RT127 lineages. Indeed, a recent study by Dingle et al. (57) provides compelling evidence that tetracycline selection played a crucial role in the rapid and recent international spread of RT078 clones.
The prevalence of FQR in European C. difficile populations can be as high as 40%, mainly associated with hospital outbreaks of RT027 strains (10,42,43). As with RT027, FQR might also be an important driver of clonal expansion in ST11. In our study, FQR was largely restricted to RT078/126 strains of human clinical origin and was notably absent in C. difficile from Australian livestock, reflecting the current restrictions on fluoroquinolone usage in food animals in this country (10).
WGS permits rapid prediction of an antimicrobial phenotype. We found concordance between MICs and in silico AMR screening was high for the tetracycline and fluoroquinolone phenotype (100%), but poor for the MLS B phenotype (36.1%). Over a third of all strains, again principally RTs 126/078, had an MLS B phenotype, yet only 36% of these harbored ermB ϩ Tn6194, the first such report from humans in Australia or from animals elsewhere in the world. This element is the most common ermB-containing element in European human clinical isolates, has interspecies transfer proficiency, and is one of the defining genetic features of epidemic RT027 (1,40,42). The explanation for the MLS B ϩ ermB mutant is not clear. Such strains did not contain alterations in ribosomal proteins or 23S rRNA, both alternative mechanisms from which reduced susceptibility to macrolides and lincosamides can arise (43). Previously, Spigaglia et al. (43) showed that treatment of MLS B ϩ ermB mutant strains with two efflux pump inhibitors did not lead to reductions in MICs. Therefore, it appears that in the ST11 lineage ermB is not the primary mechanism underlying the MLS B phenotype, and other mechanisms, potentially efflux but possibly novel, may be at play.
Many of the underlying AMR elements we identified show provenance in different commensal species residing within the gut of pigs and cows. Some of these elements are fully capable of both intraspecies transfer to different C. difficile RTs and interspecies transfer to other genera (41,42,44). Taken together with our microevolutionary analysis, this suggests that ST11 and perhaps also ST258 has the capability and propensity to move between production animals and humans and in doing so can possibly access and exchange DNA with an enormously diverse metagenome found in the human and pig (monogastric) and cow (ruminant) gut microbiota. The high prevalence of cryptic aminoglycoside and streptothricin resistance gene clusters originating from E. rhusiopathiae is intriguing as C. difficile is inherently resistant to aminoglycosides. It most likely reflects a long history of reciprocal lateral gene transfer within the monogastric and ruminant gut environments, with C. difficile likely serving as a reservoir of AMR loci, for both other C. difficile lineages and other commensal genera.
Analysis of genes common to the PaLoc, CdtLoc, and S-layer identified several new alleles and many instances of RT/ST lineage-specific diversity. These findings may indicate evolution within different host environments and possibly explain differences in virulence potential between more (078 and 126) and less (033, 288, and 127) successful lineages. For example, CDT ϩ strains are associated with more severe diarrhea, a higher case-fatality rate, and refractory disease (45). The high sequence diversity in cdtB (encoding the binding component of CDT), particularly between the LCT ϩ CDT ϩ Comparative Genomics of C. difficile Sequence Type 11 ® and LCT Ϫ CDT ϩ lineages, may reflect differences in host cell binding in vivo for these different genotypes. Furthermore, DNA binding for the response regulator CdtR is predicted to occur within the C-terminal domain (45). It is possible that the truncated CdtR found exclusively in RTs 078 and 126 may be nonfunctional as a positive regulator of cdtA/B, conferring a fitness advantage in these more virulent/successful lineages. Similarly, one of the novel tcdB RBD variants identified in this study (allele 21) was unique to ST258. The RBD of TcdB is a critical region for interaction with host epithelial cell membranes, and variations within this region have been associated with enhanced virulence (1,21). The significant changes in amino acid biochemistry in this region could result in an alternate, potentially less virulent (less successful), disease phenotype compared to the more globally disseminated ST11. Finally, the S-layer plays a central role in adaption to life in the gastrointestinal tract and evolves in response to host immunological selection (21). The four distinct S-layer cassettes identified were highly congruent with the RT and/or ST lineage, possibly reflecting evolution in different (original) host species. Moreover, S-layer cassette typing could be a useful additional discriminatory typing tool for the numerous RTs within ST11 and clade 5.
Comprising just 19.8% (2,058 genes) of its genetic repertoire, the core genome of C. difficile ST11/258 is remarkably small, a finding that supports earlier studies describing ultralow genome conservation in this species (16 to 40%) (1). These values are considerably lower than those for other pathogens known to have significant genomic variability: e.g., Helicobacter pylori (ϳ59%), Campylobacter jejuni (ϳ53%), Streptococcus pneumoniae (ϳ47%), Escherichia coli (ϳ40%), and Legionella pneumophila (ϳ33%) (1). At almost 10,400 genes, the pan-genome is comparable with that of Salmonella enterica (10,000 genes), one of the most diverse species in the bacterial kingdom (46). Underlying the incredible diversity seen in the accessory genome is a substantial population of Siphoviridae and Myoviridae, including C2, CD38-2, CD27, MMP02, CDHM1, MMP03, CD506, and CDHM19. These temperate tailed prophages share a similar GC to their host (28 to 30%) and have coevolved with C. difficile over very long periods (1). Studies have shown that in their lysogenic form, these phages are able to influence the expression of multiple genes associated with the fitness and virulence of the host bacterium during infection, including modulation of quorum-sensing, flagellar assembly, AMR transduction, and toxin production (1,47).
There are limitations to this work. While widely recognized as the current standard approach for studies of pathogen transmission (18,19,48,49), the molecular clock for any species is an approximation based on within-host variation and the assumption of a constant rate of evolution. Therefore, it does not account for the genetically quiescent nature of C. difficile spores and may underestimate the evolutionary distance between strains (19). Also, with comparable resolution to SNV analysis and the added bonus of standardized nomenclature, cgMLST could be used as a comparator typing tool in future studies (20). Last, we acknowledge that plasmids were not investigated. Differentiation of large plasmids and some prophage elements is difficult, but future studies that screen large cryptic plasmids such as pDLL3026 may provide further insights into the evolution of clade 5 strains and their prophages.
In summary, the One Health paradigm, connecting the health of humans to the health of animals and their shared environments, represents the optimal approach for understanding the epidemiology and evolution of C. difficile, as well as improving strategies to curtail the growing public health threat posed by CDI. Better communication and coordinated efforts between public health authorities, veterinary medicine, and agriculture will be key in developing interventions aimed at reducing the levels of C. difficile spores in the environment. These include curtailing the use of late-generation cephalosporins, immunization, environmental cleaning (e.g., sporicidal treatment of effluent) and discontinuing the practice of slaughtering neonatal calves (37,50).
Our study demonstrates the zoonotic potential of ST11 and its close relative ST258 and provides a framework for future epidemiological and experimental studies of other livestock-or agricultural-associated lineages of C. difficile. Moreover, our findings challenge the long-held misconception that CDI is primarily nosocomial in origin and clearly emphasize the need for continued genetic and phenotypic surveillance of C. difficile from different ecological niches. As we have shown here, WGS provides the ultrafine-scale resolution needed to decipher cryptic CDI transmission pathways and identify emerging clones, as well as changes in AMR and key virulence loci.
Whole-genome shotgun sequencing. Genomic DNA was extracted from a 48-h blood agar subculture of C. difficile using a QuickGene DNA tissue kit (Kurabo Industries, Osaka, Japan). A total of 185 strains were subjected to WGS using the MiSeq and HiSeq platforms and standard Nextera XT libraries (Illumina, San Diego, CA) (5). For comparative analysis, the genomes of 22 previously sequenced human clinical ST11 strains from European studies (4,36,41) were also included in this study. Accession numbers for WGS data are provided in Data Set S1 at figshare.
Microevolutionary analysis. Core genome single nucleotide variant (SNV) analysis followed the "gold standard" approach of Eyre et al. (18), as recently described (5). Briefly, trimmed reads were mapped to the finished chromosome of C. difficile strain M120 (ST11 [accession no. NC_017174]) using Smalt v0.7.6 (www.sanger.ac.uk/resources/software/smalt/). Candidate SNVs were filtered for quality and coverage and called across all mapped sites using SAMtools v0.1.12-10, with subsequent removal of indels and masking of repetitive regions, mobile genetic elements, and recombination regions (5). This approach resulted in a final set of 1,076 concatenated SNVs in "clonal frame," which was used (i) to calculate pairwise core genome SNV differences between isolates and (ii) to generate maximum likelihood phylogenies. Trees were produced using RAxML v8.1.23 (51) with a generalized time-reversible (GTR) model of evolution and CAT approximation of rate heterogeneity and curated using FigTree v1.4.2 (52) and iToL v4 (53).
Comparative genomic analysis. Sequence reads were interrogated for MLST and acquired AMR genes using the pubMLST and ARG-ANNOT databases, respectively, compiled within SRST2 v0.1.8 (54). Genome assembly and annotation and comparative analysis of transposons (Tns), prophages, and virulence loci were performed in silico as previously described (5). Annotated genomes were used as input for pan-genome analysis with Roary v3.6.0 and PanGP v1.0.1 as previously described (5). Definitions of the core and pan-genome and estimates of the irrespective size and trajectory were made using models and regression algorithms proposed by Tettelin and colleagues (55), as previously described (5).
Antimicrobial susceptibility testing. MICs for 13 antimicrobials were determined using the CLSI agar dilution methodology (56). Clinical breakpoints were applied as recommended by CLSI (for amoxicillin/clavulanate, ceftriaxone, clindamycin, clindamycin, erythromycin, meropenem, moxifloxacin, piperacillin-tazobactam, and tetracycline), EUCAST (for vancomycin and metronidazole), and the European Medical Agency (for fidaxomicin) as previously described (5). A MIC of Ն32 mg/liter was used to define resistance to rifaximin (5), and there are no published breakpoints for trimethoprim.
Statistical analysis. Where appropriate, statistical significance was determined using a 2 test, t test, or Kruskal-Wallis H test, using a cutoff P value of Յ0.05.
Data availability. All supplemental data for this article (Data Set S1) and 207 annotated C. difficile genome assemblies are hosted at the online digital repository figshare, available at https://doi.org/10 .6084/m9.figshare.4822255.