A Role for Tetracycline Selection in Recent Evolution of Agriculture-Associated Clostridium difficile PCR Ribotype 078.

The increasing clinical importance of human infections (frequently severe) caused by Clostridium difficile PCR ribotype 078 (RT078) was first reported in 2008. The severity of symptoms (mortality of ≤30%) and the higher proportion of infections among community and younger patients raised concerns. Farm animals, especially pigs, have been identified as RT078 reservoirs. We aimed to understand the recent changes in RT078 epidemiology by investigating a possible role for antimicrobial selection in its recent evolutionary history. Phylogenetic analysis of international RT078 genomes (isolates from 2006 to 2014, n = 400), using time-scaled, recombination-corrected, maximum likelihood phylogenies, revealed several recent clonal expansions. A common ancestor of each expansion had independently acquired a different allele of the tetracycline resistance gene tetM Consequently, an unusually high proportion (76.5%) of RT078 genomes were tetM positive. Multiple additional tetracycline resistance determinants were also identified (including efflux pump tet40), frequently sharing a high level of nucleotide sequence identity (up to 100%) with sequences found in the pig pathogen Streptococcus suis and in other zoonotic pathogens such as Campylobacter jejuni and Campylobacter coli Each RT078 tetM clonal expansion lacked geographic structure, indicating rapid, recent international spread. Resistance determinants for C. difficile infection-triggering antimicrobials, including fluoroquinolones and clindamycin, were comparatively rare in RT078. Tetracyclines are used intensively in agriculture; this selective pressure, plus rapid, international spread via the food chain, may explain the increased RT078 prevalence in humans. Our work indicates that the use of antimicrobials outside the health care environment has selected for resistant organisms, and in the case of RT078, has contributed to the emergence of a human pathogen.IMPORTANCE Clostridium difficile PCR ribotype 078 (RT078) has multiple reservoirs; many are agricultural. Since 2005, this genotype has been increasingly associated with human infections in both clinical settings and the community. Investigations of RT078 whole-genome sequences revealed that tetracycline resistance had been acquired on multiple independent occasions. Phylogenetic analysis revealed a rapid, recent increase in numbers of closely related tetracycline-resistant RT078 (clonal expansions), suggesting that tetracycline selection has strongly influenced its recent evolutionary history. We demonstrate recent international spread of emergent, tetracycline-resistant RT078. A similar tetracycline-positive clonal expansion was also identified in unrelated nontoxigenic C. difficile, suggesting that this process may be widespread and may be independent of disease-causing ability. Resistance to typical C. difficile infection-associated antimicrobials (e.g., fluoroquinolones, clindamycin) occurred only sporadically within RT078. Selective pressure from tetracycline appears to be a key factor in the emergence of this human pathogen and the rapid international dissemination that followed, plausibly via the food chain.

emergent, tetracycline-resistant RT078. A similar tetracycline-positive clonal expansion was also identified in unrelated nontoxigenic C. difficile, suggesting that this process may be widespread and may be independent of disease-causing ability. Resistance to typical C. difficile infection-associated antimicrobials (e.g., fluoroquinolones, clindamycin) occurred only sporadically within RT078. Selective pressure from tetracycline appears to be a key factor in the emergence of this human pathogen and the rapid international dissemination that followed, plausibly via the food chain.
KEYWORDS Clostridium difficile, tetracycline resistance, whole-genome sequencing, phylogenetic analysis, emerging pathogen, PCR ribotype 078 C lostridium difficile infection (CDI) is a significant international challenge, affecting patients in community and health care environments worldwide (1)(2)(3). The severity of symptoms ranges from mild diarrhea to pseudomembranous colitis and toxic megacolon. Crude 30-day mortality in the United Kingdom is 16% (in a setting of endemicity) and can exceed 30% (4,5), while it has been estimated that almost half a million CDIs caused 29,000 deaths in a single year in the United States (2).
The molecular epidemiology of CDI varies both temporally and geographically, frequently in response to local antimicrobial prescribing (2,(6)(7)(8). Clinically important outbreak-associated genotypes can emerge when the inherent resistance of C. difficile to cephalosporins (9) is supplemented with acquired resistance to certain high-risk antimicrobials, including clindamycin (10) and, more recently, fluoroquinolones. The latter contributed to the emergence of multiple phylogenetically unrelated outbreakassociated genotypes, including "hypervirulent" PCR ribotype 027 (11)(12)(13). However, the reason(s) for the changing prevalence of other clinically important C. difficile genotypes is frequently unknown (14).
The increased importance of C. difficile RT078 as a human pathogen was first reported in The Netherlands, with CDI cases rising from 3% to 13% during 2005 to 2008 (15). Around the same time, a 10-fold increase was noted in North America (16). Similar increases and occasional outbreaks were subsequently recorded throughout Europe (17)(18)(19), and the incidence of C. difficile RT078 infections has recently increased to 4.4%, 9.7%, and 8.1% of total CDI cases in North America, England, and Scotland, respectively (2,(20)(21)(22). Three distinctive features of RT078-associated CDI raise specific concerns, namely, increased severity of disease with the highest genotype-specific mortality rate (15,23), a higher proportion of community-associated disease, and more infections in younger age groups than by other genotypes (2,12,15,24).
The agricultural association of C. difficile RT078 is reflected in its isolation from sick and healthy animals (frequently pigs), bird droppings, vermin, and the farm environment (25)(26)(27)(28). However, in common with many other toxin-producing C. difficile genotypes, ribotype RT078 can be carried asymptomatically by human infants and adults (29,30). Isolates of RT078 recovered from humans and animals are genetically very similar and can be identical (30). This genotype has also been isolated from a variety of retail meat products, including pork, beef, and others (31,32). Therefore, the natural reservoirs of RT078 support the hypothesis that humans become colonized via the food chain and/or the environment (25).
Whole-genome sequence data have been used to study the emergence and transmission of many bacterial pathogens. The international dissemination of hypervirulent fluoroquinolone-resistant C. difficile 027 was revealed in this way (12), and its rapid localized nosocomial transmission was demonstrated, as with other fluoroquinoloneresistant genotypes (8). Here, we used whole-genome sequencing and phylogenetic approaches to study the recent evolutionary history of C. difficile RT078 and to investigate the hypothesis that the recent clinical prominence of this genotype has been due to antimicrobial selection.   [30] isolates) were examined for the presence of tetM (Fig. 1A). The non-RT078 genotypes were described using the notation FIG 1 Prevalence of tetracycline resistance determinants in RT078 and other clinically relevant C. difficile genotypes. (A) Proportion (percentage) of each clinically important genotype that was positive for the ribosomal protection protein (RPP) gene tetM. Data are shown for genotypes having 10 genomes or more, from isolate collections representing Oxfordshire (EIA positives and negatives, infant and farm) and Leeds, North America, and Europe (Optimer clinical trial) (8,29,30,33,34). The total number of isolates of each genotype is shown above the bar. Clades are defined as described in reference 69. (B) Numbers of genomes in the collections described above which contained additional non-tetM tetracycline resistance determinants. For the ST11(078) genotype, the additional Scottish (n ϭ 110) isolate collection was also included (indicated by "ϩ110" above the bar at the left). Therefore, a total of 340 ST11(078) isolates were examined (the n ϭ 230 described in the panel A legend above plus an additional n ϭ 110 Scottish ST11s), the aim being to illustrate the overall prevalence of "non-tetM" tetracycline resistance determinants within this genotype. Genotypes could be classified as (i) Ͼ60% tetM positive, (ii) Ͼ0% but Ͻ20% tetM positive (the majority being Ͻ5%), or (iii) tetM not detected (Fig. 1A). Non-RT078 genotypes were Ͻ20% tetM positive, with the notable exceptions of ST37(017) ( (Fig. 1A). Therefore, at over 75%, RT078 was the most highly tetM-positive clinically relevant genotype.
United Kingdom-representative RT078 phylogeny. A United Kingdom-specific RT078 phylogeny was constructed using genomes from clinical infections in Oxfordshire (n ϭ 78), the Leeds region (n ϭ 104), and Scotland (n ϭ 110) ( Fig. 2A and B; see also Table S1). Annotation revealed minimal evidence of geographic structure, contrasting markedly with the highly structured distribution of tetM sequences ( Fig. 2B and C) described in detail below (maximum likelihood [ML] phylogeny obtained before dating also supplied; see Fig. S2).
Prior to annotation, distinct tetM allele sequences were assigned a number (available at https://pubmlst.org/bigsdb?dbϭpubmlst_cdifficile_seqdef&pageϭdownload Alleles). Among the tetM-positive United Kingdom RT078 genomes, the following three tetM alleles predominated: tetM 10 (36/292, 12.3%), tetM 16 (101/292, 34.6%), and tetM 19 (78/292, 26.7%) (Fig. 2B). Colored bars (Fig. 2B) (or branches in Fig. S1) were used to identify distinct tetM alleles. Each of tetM alleles 10, 16, and 19 were carried by closely related, Tn916-like conjugative transposons (well-established Gram-positive tetMcarrying mobile elements) (37). Independent acquisition events, estimated from the phylogeny to have occurred between 1995 and 2006, were suggested by their unique chromosomal insertion sites (Fig. 2B, circular map) and by the level of nucleotide sequence identity across the Tn916-like elements on which they were carried (87% to 100%, depending on the region compared) (Fig. S3). Acquisition of tetM 16 or tetM 19 was associated with significantly shorter branch lengths (confirmed by median evolutionary distinctiveness [ED] scores of 3.78 and 3.58, respectively, versus 7.22 for branches representing genomes lacking a ribosomal protection protein gene; P Ͻ 0.001). This observation is consistent with the presence of clonal expansion in response to tetracycline-associated selection pressure ( Fig. 2B and C); significantly lower ED scores indicate unexpectedly short branches (39). It is possible that, for a given branch, there could be some genetic change other than tetM that was the cause of the clonal expansion, but since the same pattern was observed on several independent branches where tetM was acquired, each time within a different Tn916 variant (Fig. S3), it seems very likely that this underlies the clonal expansion. The acquisition of efflux pump tet40 on its own was not associated with clonal expansion, with only a slightly higher median evolutionary distinctiveness score than was calculated for tet40 absence (Fig. 2C).
The same phylogeny was annotated for the presence of additional resistance determinants (conferring aminoglycoside, fluoroquinolone, or clindamycin resistance; isolates. Branch colors, as defined for panel A, denote the location of each genome. Colored bars to the right of the phylogeny indicate the presence of tetracycline resistance determinants; ribosomal protection protein (RPP) allele sequences detected within each genome were assigned numbers to identify distinct nucleotide sequences of tetM, tetO/32/O, tetO, or tetW. To the right of the phylogeny, the chromosomal locations of the three most prevalent tetM alleles (designated tetM 10, 16, and 19) relative to the RT078 M120 genome (NCBI reference sequence NC_017174.1) are shown. All phylogenies included in this study are directly comparable post-1990, i.e., in the time frame of RT078 emergence; the gray shaded block over the region corresponding to the time period prior to that date indicates that region is not scaled identically and should not be used for comparisons. (C) The extent to which RT078 clonal expansions are associated with geographic structure and tetracycline resistance (ribosomal protection proteins and efflux pumps) was determined using two-sided quantile regression. (Left) Differences in median evolutionary distinctiveness scores compared to Oxfordshire samples. A lower evolutionary distinctiveness value indicates a larger proportion of close relatives in the tree. The P values indicate the overall significance of geographic location in the evolutionary distinctiveness score. (Center) Differences in median evolutionary distinctiveness scores for samples with ribosomal protection proteins detected compared to ribosomal protection protein-negative samples, overall and for each of the three putative tetM-associated clonal expansions. A lower evolutionary distinctiveness value indicates a larger proportion of close relatives in the tree. The P values indicate the significance of gene presence in the evolutionary distinctiveness score. (Right) Differences in median evolutionary distinctiveness scores for samples with tetracycline efflux pumps [tet40 and tetA(P)] detected compared to efflux pump-negative samples. A lower evolutionary distinctiveness value indicates a larger proportion of close relatives in the tree. The P value indicates the significance of gene presence in the evolutionary distinctiveness score. Fig. S4A to C), but no evidence of associated clonal expansions occurring independently of the tetM-associated expansions was found ( Fig. S4D to F).
RT078 phylogenies representing United Kingdom regions. Separate phylogenies were constructed to examine the detailed evolutionary history of RT078 within two of the geographic regions represented in the United Kingdom phylogeny. The two regions were Scotland (population, 5.295 million; area, 30,918 square miles) ( Fig. 3A and B) and Oxfordshire (population, 655,000; area, 1,006 square miles) ( Fig. 3A and C) (ML phylogeny obtained before dating also shown; Fig. S2).
The branches of the Scottish phylogeny (n ϭ 110 genomes; Fig. 3B) were colored to represent geographic regions (administrative areas, or "health boards"; Fig. 3A), thus increasing the level of geographic discrimination. As described above, geographic structure was absent, with health care-associated and community isolates intermingling (Fig. 3B, dots), but the distribution of the tetM alleles 10, 16, and 19 within the phylogeny was highly structured. The Oxfordshire regional phylogeny (n ϭ 94 genomes, Fig. 3C) represented a more densely sampled, smaller geographic area (Fig. 3A). Here, the enzyme immunoassay (EIA)-positive C. difficile clinical isolate genomes (n ϭ 78; Fig. 2B) were supplemented with EIA-negative clinical isolates (n ϭ 9) (i.e., isolates from patients with diarrhea but without evidence of toxin production, suggesting that C. difficile was colonizing the patient rather than causing disease) and nonclinical isolates from healthy infants (n ϭ 6) (29) and a lamb (n ϭ 1) (Table S1). All genomes were pathogenicity locus (PaLoc) (i.e., toxin A and B encoding sequence) positive (40). This regional phylogeny also lacked structure according to location or isolation source, but it was again structured according to tetM allele (Fig. 3C).
International phylogenies confirmed that three tetM-positive RT078 clades are present across continents. Two international RT078 phylogenies were constructed using genomes from clinical infections in England (Oxfordshire [n ϭ 78] and Leeds [n ϭ 104]) supplemented first with clinical and nonclinical isolates from The Netherlands (30) (Table S1;  Tetracycline selection in other C. difficile genotypes. Over 60% of genomes belonging to each of five non-RT078 genotypes were tetM positive (Fig. 1A). Four of these were investigated phylogenetically, namely, ST37(017), ST54(012), ST35(046), and nontoxigenic ST26(140) (8,29) (Table S2), but not ST48(038/104), as only 12 genomes were available. Branches were colored according to isolation source and geography as described above, and tetM alleles are indicated by colored bars (Fig. 5). As described above for RT078, additional resistance determinants (Table 1 and 2) were also highlighted (when present in five genomes or more) to reveal the possible impact of selection by other antimicrobials (Fig. 5, colored dots) (ML phylogeny obtained before dating also shown; Fig. S2).
A number of recent tetM acquisition events were obvious (Fig. 5). These were followed by possible clonal expansions, most notably within genotypes ST35(046) and (nontoxigenic) ST26(140) (Fig. 5A and B). Clonal expansion was particularly marked in ST26(140), where all genomes were tetM positive, and clonal expansion occurred in the absence of disease-causing ability, this genotype being nontoxigenic (lacking the pathogenicity locus [PaLoc] in all genomes [40]). With the exception of ST35(046), where aminoglycoside and clindamycin resistance determinants colocalized with tetM (Fig. 5A), and the fluoroquinolone-resistant region of the ST37(017) phylogeny (Fig. 5D) (8), there was no clear evidence of the clonal expansions which had followed the acquisition of the non-tetM antimicrobial resistance determinants. In common with RT078, all four phylogenies (Fig. 5) lacked geographic structure, with the exception of the fluoroquinolone resistance region of the ST37(017) phylogeny (Fig. 5D) (8).
Sequences of RT078 tetracycline resistance determinants support the hypothesis of its zoonotic origin. The tetM sequences in C. difficile described here are typical Branch colors are as described in the panel A legend. Colored bars to the right of the phylogeny denote the ribosomal protection protein (RPP) allele sequences detected within each genome (as described in the Fig. 2 legend), numbers being assigned to identify distinct nucleotide sequences of tetM or tetO/32/O. Isolates were cultured from human clinical samples received from both hospital and community patients, the latter being indicated by a black dot. The gray shaded block over the region corresponding to the period prior to 1990 indicates that the region is not scaled identically for different phylogenies and should not be used for comparisons. (C) Time-scaled RT078 phylogeny for Oxfordshire clinical and nonclinical isolates. Branch colors are as described in the panel A legend. Colored bars indicate ribosomal protection protein alleles as described above. of many Gram-positive species, including established zoonotic species. For example, the RT078 tetM 10 allele shared 100% nucleotide sequence identity with tetM genes of Streptococcus agalactiae, Enterococcus faecalis, Escherichia coli, and Streptococcus pneumoniae and 99% nucleotide sequence identity with Streptococcus suis (a pathogen of pigs transmitted zoonotically to humans [41,42]). Identical tetM 10 sequences have also been found in Gram-negative bacterial species, including Escherichia coli. The RT078  (Table S1) (33,34). In the phylogenies shown in panels B and D, a single closely related genome of a distinct genotype (ST12 and ST109, respectively) was included to ensure that the tree was rooted pre-1990 and that the four phylogenies could therefore be compared post-1990. The gray shaded block over the region corresponding to the period prior to 1990 indicates that the region is not scaled identically for different phylogenies and should not be used for comparisons. Genomes were from Oxfordshire (clinical EIA positives and negatives plus nonclinical, healthy infants) and Leeds (clinical isolates); branch colors indicate location/isolation source as described above. Colored bars to the right of each phylogeny indicate the presence of tetracycline resistance determinants. Colored dots represent additional genetic determinants identified as conferring resistance to fluoroquinolones, rifampin, clindamycin, and aminoglycosides ( Other RT078 tetracycline resistance determinants were also identical or were very closely related to those found in bacteria with an agricultural association which may be zoonotic. For example, RT078 tet40 sequences shared 99% to 100% identity with Streptococcus suis tet40 (GenBank accession no. KC790465.1) and the RT078 tetO sequences shared over 99% nucleotide sequence identity with Campylobacter jejuni, Campylobacter coli, and S. suis tetO sequences. In addition, the RT078 tetO/32/O mosaic sequence shared 99% identity with the sequence found in the S. suis genome.

DISCUSSION
Our time-scaled phylogenies revealed geographically unstructured, parallel tetMassociated RT078 clonal expansions, dating from around the year 2000 (Fig. 2B, 3B and C, and 4B and D). These findings are consistent with an evolutionary response to tetracycline selective pressure, within the milieu of tetracycline-resistance determinants, during the time frame of increasing numbers of RT078-associated clinical cases (15,(18)(19)(20)22). The results from our use of whole-genome sequence-based phylogenies explained the prior observation (using multilocus variable-number tandem-repeat analysis [MLVA] [38]) that the majority (85%) of human and porcine RT078 genomes are genetically related, irrespective of the European country of origin, since we showed that most RT078 genomes are recent descendants from one of three distinct (but closely related and now internationally disseminated) tetM-positive ancestral RT078 genomes ( Fig. 2 to 4).
Tetracyclines were initially introduced around 60 years ago in both clinical and veterinary settings. However, following the emergence of resistance, they were largely replaced in human medicine by fluoroquinolones (43). Consequently, by 2010 to 2013, tetracyclines represented a total of Ͻ18% of the antibiotics consumed by patients in England, and most (92%) were prescribed in the community by general practitioners (44). Over the time period relevant to this study, tetracyclines were most commonly used for the treatment of acne and chlamydial sexually transmitted diseases. It is implausible that such prescribing in teenagers and young adults provided extensive selection pressure for C. difficile, given that healthy individuals living in the community have very low rates of colonization by these bacteria (45).
In contrast to their use in human medicine, tetracyclines remain the most widely used antimicrobial for the treatment, control, and prevention of infections in animals (46). In addition, their use for growth promotion (in subtherapeutic doses) continues around the world, if not overtly, under the guise of disease prevention. This is the case despite the fact that growth promotion is banned in Europe (47) and that claims of growth promotion have been voluntarily removed by drug companies in the United States at the request of the FDA (January 2018). During 2015, 6,880 metric tons of tetracyclines were sold in the United States (48) (representing a 31% increase from 2009), compared to 166 tons in the United Kingdom (49). The extent of agricultural tetracycline use, the prevalence of RT078 in animals used for food (25,26,31,32,38), and the time frame of RT078 emergence all implicate tetracycline use in agriculture as a plausible source of selective pressure. The global food chain, including, for example, regionally concentrated livestock production followed by widespread distribution of meat products, represents an obvious route for rapid RT078 dissemination (food and livestock RT078 genomes/isolates are listed in references 25 and 50), which would be consistent with our phylogenies (Fig. 2 and 4). However, indirect transmission from the agricultural environment to humans via contaminated water or vegetables (51) is also a possibility. Our report provides evidence of a plausible agricultural link underlying the emergence of RT078 by presenting its recent evolutionary history with respect to the acquisition of antimicrobial resistance.
The absence of geographic structure within our RT078 phylogenies is consistent with its rapid international spread (Fig. 4) as described previously (38,50) and with the Tetracycline Selection in C. difficile RT078 lack of large-scale, localized nosocomial RT078 outbreaks (Fig. 2B, 3B and C, and 4B and D). Among other genotypes, such outbreaks have been associated with extensive prescribing of, and resistance to, high-risk antimicrobials such as clindamycin, cephalosporins, and fluoroquinolones. In our study, large-scale clonal expansions were not associated with fluoroquinolone or clindamycin resistance in RT078 genomes (see Fig. S4 in the supplemental material). Equivalent analyses for cephalosporins cannot be performed because the genetic mechanism(s) of cephalosporin resistance in C. difficile has yet to be defined, and, although MICs can vary, C. difficile has typically been considered inherently cephalosporin resistant, irrespective of genotype (9). The international spread of RT078 indicates that changes in antimicrobial resistance phenotypes could potentially have an impact at any location, depending on local prescribing practices. Consequently, RT126, a frequently isolated fluoroquinolone-resistant descendant of RT078 (Fig. S1, phylogenetic context), which is prevalent in Italian clinical settings (52,53), is of particular concern, as is the epidemic multidrug-resistant RT078 observed in Spanish swine (54).
Tetracyclines (such as doxycycline) are associated with a lower risk of CDI in humans (55) than has been established for many antimicrobials. It has been proposed (55) that the use of tetracyclines as an alternative to riskier antimicrobials such as fluoroquinolones and clindamycin, whenever appropriate, may decrease CDI associated with antibiotic use. However, the emergence of tetracycline-resistant C. difficile genotypes such as RT078 and others (Fig. 5) may require this approach to be informed by resistance data for the infecting strain, to avoid triggering tetracycline-resistant CDI. Tetracyclines may be unrecognized as a potential CDI risk factor, since resistance emerged relatively recently (with respect to RT078) and is less common at the population level (Fig. 1A); clindamycin-resistant and fluoroquinolone-resistant strains in particular predominate under outbreak conditions (10)(11)(12).
The identification of widespread tetracycline resistance (Ͼ60% tetM positive) in only five C. difficile genotypes in addition to 078 (Fig. 1A) is consistent with previous reports (17,56,57). Phylogenetic analysis showed that ST35(046) contained two plausible tetM-associated clonal expansions (Fig. 5A), but the relatively small numbers precluded quantitative evolutionary distinctiveness analysis. Like RT078, ST35(046) has been found in pigs and has caused human outbreaks of CDI (58). This genotype also illustrates the possibility that selection by one antimicrobial can drive the acquisition of further, linked resistance genes, as almost every tetM-positive ST35(046) genome was also positive for clindamycin and aminoglycoside resistance determinants (Fig. 5A). Nontoxigenic ST26(140) contained tetM in every genome examined, suggesting stable integration predating a recent clonal expansion (Fig. 5B) concurrent with that of RT078. ST26(140) therefore illustrates the possible consequences of tetracycline selection in a harmless commensal organism, confirming that tetracycline selection alone may have been sufficient to drive the emergence of RT078.
The hypothesis that RT078 has an agricultural origin is further supported by the observation that RT078 shares many resistance determinants with zoonotic pathogens such as Streptococcus suis, Campylobacter jejuni, and C. coli, suggesting a common reservoir (see Results). Quantitative analysis (Fig. 2C; see also Fig. S4E and F) confirmed that tetM was associated with RT078 clonal expansions. Although widespread in RT078, the tetracycline efflux pump tet40 did not show such an association on its own ( Fig. 2B and C). Efflux pumps often confer a low-level-resistance phenotype, assisting bacterial survival at sublethal concentrations of antimicrobials (for example, tetK in livestockassociated methicillin-resistant Streptococcus aureus [LA-MRSA] CC398 [59]). They thereby function in promoting the acquisition of further high-level-resistance determinants, such as tetM. The parallels between RT078 and zoonotic Streptococcus suis also extend to their epidemiology. S. suis is a globally distributed emergent pathogen of humans (42), commonly isolated from pigs. Geographic clustering of subpopulations is absent (41), and S. suis has exhibited rapid, recent increases in tetracycline resistance (59). The emergence of human pathogens, coincident with tetracycline resistance acquisition, has also been noted among other bacterial species. Tetra-cycline resistance in group B streptococci may have contributed to its emergence as a leading cause of human neonatal infections (60). In a study examining LA-MRSA S. aureus CC398 isolates, almost all were found to be tetM positive, and many were also found to carry the tetK (efflux pump) gene (61).
Although multiple lines of evidence indicate a role for tetracycline selection in the recent evolutionary history of RT078, the possibility exists that further lineage-specific genetic changes (unrelated to tetracycline resistance) contributed to its tetM-associated clonal expansions (Fig. 2 and 4; see also Fig. S2). A recently proposed hypothesis is that an enhanced ability to metabolize the disaccharide trehalose (conferred by a specific four-gene chromosomal insertion) helped to drive the emergence of C. difficile RT078 in humans (62) due to the introduction of trehalose as a food additive. The authors of that study identified the same gene cluster in closely related non-078 clade 5 (clades defined as described in reference 35) PCR ribotypes (033, 045, 066, and 126). Therefore, the possibility that the insertion occurs throughout clade 5 or, indeed, among the other four C. difficile clades was not excluded. The trehalose and tetracycline hypotheses are not mutually exclusive. However, the available evidence suggests that tetracycline resistance driven by tetM acquisition remains the most plausible available explanation for the recent clonal expansions observed in RT078.
The role of selection by antimicrobials other than tetracycline (fluoroquinolones, clindamycin, aminoglycosides) was investigated (Fig. S4), and a small potential contribution by aminoglycosides (also used in animal production) was indicated by the presence of the aphA1 resistance gene in a minority of RT078 genomes (Fig. S4A). However, aphA1 could not be assessed independently of tetM (Fig. S1) because the two genes colocalized. Further work would be required to compare the total gene content of tetM-positive RT078 isolates with that of older tetM-negative isolates to identify further potentially relevant genetic differences that could explain the clonal expansions. The identification of tetM-associated clonal expansions in genetically divergent C. difficile genotypes, ST35(046) (together with clindamycin and aminoglycoside resistance determinants; Fig. 5A) and nontoxigenic ST26(140) (Fig. 5B), serves to further highlight tetM as a factor common to recent clonal expansions within distinct C. difficile genetic backgrounds. To further confirm the zoonotic origin of RT078 and the link to agricultural tetracycline use, large-scale, parallel data showing changing tetracycline use over time and concurrent RT078 isolates from clinical cases and farm animals would be required. Although it would be challenging to source both usage data and corresponding isolate collections retrospectively, evidence of phylogenetic coclustering of human and animal RT078 genomes has been provided both nationally and internationally (30,50), using collections assembled as available from other studies and reference laboratories.
In summary, numerous lines of evidence described in this and prior work (25,30,50) support the hypothesis that tetracycline use in agriculture has provided recent selection pressure which has impacted on the evolution of tetracycline-resistant RT078. This in turn supports the hypothesis (first proposed in 2012 [25]) that humans become colonized by RT078 via the food chain and/or the environment. Recent studies using whole-genome sequencing of 65 Dutch RT078 isolates and 248 international RT078 isolates (30,50) provided data consistent with the rapid spread of RT078 both internationally and between animals and humans. Furthermore, a range of tetracyclineresistant determinants were described in both human and animal RT078 populations, including tetM, tet40, tet32, tet44, and tetO (30,50). Our findings independently confirm and extend this work, since we demonstrate at least three independent clonal expansions of RT078, with rapid international spread, following unconnected tetM acquisition events (tetM carried on distinct Tn916 variants inserted into distinct chromosomal locations; Fig. S3) which occurred in well-separated regions of the RT078 phylogeny ( Fig. 2 to 4). We also show that the RT078 genome is almost unique within the C. difficile population as a whole, in terms of the diversity of its tetracycline resistance determinants (Fig. 1). The major C. difficile RT078 transmission routes to humans are consequently more likely to be related to agriculture and international food chains than nosocomial. Our findings add to the body of evidence (50) supporting initiatives such as "One Health" (63). Our findings strongly suggest that the use of tetracycline outside the health care environment has impacted several C. difficile genotypes, most strikingly RT078, and therefore has not only selected for resistant organisms but also contributed to the emergence of this species as a human pathogen.

MATERIALS AND METHODS
C. difficile whole-genome sequences. C. difficile genomes derived from isolates of either RT078 or ST11 (n ϭ 400) were sourced from several published collections (8,29,30,33,34), as well as from an unpublished Scottish collection and the Oxford University farm, Wytham, United Kingdom (see items i to v below and Table S1 in the supplemental material). Each isolate was obtained from a distinct sample. EIA-negative isolates were inferred to be toxigenic or nontoxigenic, depending on the presence/absence of the toxin-encoding pathogenicity locus (PaLoc) (40) (Tables S1 and S2). The complete collections described in sections i, iii, and iv below have been published previously (8,29,30).  (Table S1).
(ii) Clinical C. difficile: Scotland, United Kingdom. The isolates from Scotland, United Kingdom, included 109 isolates of RT078 and 1 closely related RT066 isolate (Table S1). These isolates form part of a collection stored at the Scottish Microbiology Reference Laboratory (Glasgow). Cultures are provided by Scottish regional NHS Healthcare Boards (located per the map shown in Fig. 3A) in the event of a severe/fatal case, a suspected outbreak, or a suspected ribotype 027 infection. In addition, each Health Board provides a fixed number of samples based on the rates of infection/population. This allows surveillance of prevalent circulating strains to be assessed. RT078 isolates for this study were selected based on the ribotypes from samples referred to the Reference Laboratory between November 2007 and October 2014 and with the aim of providing the widest temporal and geographical representation. Locally, positive fecal stool samples were identified prior to 2009 using a toxin-specific EIA (or cell cytotoxicity) and post-2009 by the use of a two-step algorithm requiring GDH detection, followed by toxin assessment. These samples were from patients located in health care (n ϭ 99) and community (n ϭ 10) settings (unassigned n ϭ 1) in 12 of 14 Health Boards (there were no relevant samples from 2 small-island Health Boards [not shown in the map in Fig. 3A]).
(iii) Clinical C. difficile: North America and Europe. Thirty-two ST11 genomes from North American (Canada, n ϭ 4; United States, n ϭ 15) and European (n ϭ 13) C. difficile isolates cultured from clinical infections between November 2006 and June 2009 were available from a variety of locations from two clinical trials of fidaxomicin (Table S1, showing city and country [33,34]). Previously published RT078 genomes from human clinical cases (n ϭ 25; 2002 to 2011) in The Netherlands were also included (30) (Table S1).
(v) PCR ribotype reference isolates. For additional context, five PCR ribotype reference C. difficile genomes representing RT078, RT126, RT033, RT045, and RT066 were included, all of which are genetically very closely related, sharing the same multilocus sequence type, ST11 (see Table S1 and Fig. S1 in the supplemental material).
Genome assemblies. C. difficile genomes were assembled from short reads generated using Illumina technology (64). Reference-based assemblies were made for genomes belonging to C. difficile clade 5 (i.e., RT078 and close relatives) as described previously (65) by mapping reads to the C. difficile M120 reference genome (66) and for non-clade 5 genomes by mapping reads to the CD630 reference genome (GenBank accession no. AM180355.1) (66) (clades defined as described in reference 35). De novo assembly was performed using Velvet (version 1.0.7-1.0.18) (85) and VelvetOptimiser 2.1.7 (67), optimizing kmer size (k), expected coverage (average kmer coverage of contigs), and coverage cutoff (kmer coverage threshold) to achieve the highest assembly N 50 value (length of the smallest contig such that all contigs of that length or less formed half of the final assembly). Reads for unassembled genomes have been submitted to NCBI under BioProject identifier (ID) no. PRJNA304087 (8) and PRJNA381384 for the Scottish isolates (accession numbers are provided in Tables S1 and S2).
Identification of antimicrobial resistance determinants. The de novo assemblies were queried using the BLAST function of BIGSdb (68) to determine whether genes or nonsynonymous point mutations known to confer resistance to antimicrobials (including fluoroquinolones, tetracyclines, clindamycin, and aminoglycosides) were present and to extract the sequences of interest for further analysis. A list of the resistance gene sequences used to perform the BLAST search (and of their GenBank accession numbers) is provided (Table 1 and 2). For acquired resistance genes, a minimum level of 90% nucleotide sequence identity and gene coverage was required. Each unique tetM allele was assigned a number (allele nucleotide sequences are available at https://pubmlst.org/bigsdb?dbϭpubmlst_cdifficile _seqdef&pageϭdownloadAlleles).
Definition of multilocus sequence types (MLST). Allele sequences used in the C. difficile MLST scheme (69) were extracted from the de novo assemblies using the BLAST function of BIGSdb (68). Sequence types (STs) were assigned by querying the MLST database (https://pubmlst.org/cdifficile/). Phylogenetic analyses. Phylogenetic trees were built on the basis of the assemblies mapped to C. difficile ST11 reference M120 using the maximum likelihood approach implemented in PhyML version 3.1.17 (with a generalized time-reversible substitution model and the "BEST" tree topology search algorithm) (70). The trees were then corrected to account for recombination events using ClonalFrameML (71) version 1.11 (with default settings). The nodes of the trees were dated using the previously estimated C. difficile evolutionary rates of 1.1 mutation per year (30) for clade 5 STs (including RT078) and of 1.4 mutation per year for all other genotypes (72). The main period of particular interest (from 1990 to 2015) was allocated the greatest amount of horizontal space in graphical tree representations by compressing the pre-1990 period, making the trees directly comparable with respect to the post-1990 period. Events before 1990 are not shown since dating older nodes using a short-term evolutionary rate is problematic due to the time dependency of evolutionary rates (73). Graphical representations of trees were made using FigTree version 1.4.2 (74).
A quantitative assessment of clonal expansion(s) within a given phylogeny was performed as described previously (39) and as implemented previously (8). The evolutionary distinctiveness (ED) score of each isolate was calculated; the ED score was defined as being equal to the sum, for all branches on the path from the root to the leaf (isolate), of the lengths of the branches divided by the number of leaves that they support (39). For a given isolate, a low ED score indicated the presence of close relatives in the tree, whereas a high ED score indicated their relative absence. ED scores were compared across various factors using quantile regression statistics analyses, performed using Stata version 14.1 (College Station, TX, USA).
Data availability. Reads for unassembled genomes have been submitted to NCBI under BioProject ID numbers PRJNA304087 (8) and PRJNA381384 for the Scottish isolates (accession numbers are provided in Tables S1 and S2).