Development of Candida auris microsatellite typing and its application on a global collection of isolates

Candida auris is a pathogenic yeast that causes invasive infections with high mortality. Infections most often occur in intensive care units of healthcare facilities. It is crucial to trace the source and prevent further spread of C. auris during an outbreak setting, therefore, genotyping of C. auris is required. To enable fast and cost-effective genotyping, we developed a microsatellite typing assay for C. auris. Short tandem repeats (STRs) in C. auris were identified, and a novel STR typing assay for C. auris was developed using 4 panels of three multiplex PCRs. Having shown that the microsatellite typing assay was highly reproducible and specific, a robust set of 444 C. auris isolates was investigated to identify genotypic diversity. In concordance with whole-genome sequencing (WGS) analysis we identified five major different C. auris clusters, namely, South-America, South-Asia, Africa, East-Asia and Iran. Overall, a total of 40 distinct genotypes were identified, with the largest variety in the East Asian clade. Comparison with WGS demonstrated that isolates with <20 SNPs are mostly not differentiated by STR analysis, while isolates with 30 or more SNPs usually have differences in one or more STR markers. Altogether, a highly reproducible and specific microsatellite typing assay for C. auris was developed, which distinguishes the five different C. auris clades in identical fashion to WGS, while most isolates differing >20 SNPs, as determined via WGS, are also separated. This new C. auris specific genotyping technique is a rapid, reliable, cost-effective alternative to WGS analysis to speedily investigate outbreaks. Importance Candida auris is an emerging fungal pathogen now recognized as a threat to public health. The pathogen has spread worldwide and mainly causes hospital associated outbreaks. To track and trace outbreaks and to relate them to new introductions from elsewhere, whole genome sequencing and amplified fragment length polymorphism (AFLP) have been used for molecular typing. While the former is costly and only available in few centers, AFLP is a complicated technique and standardization is not possible. We describe a novel simple microsatellite genotyping technique based on small tandem repeats in the C. auris genome. Further we show that this microsatellite based genotyping technique has been proven comparable to WGS. Overall, this work provides a novel, rapid, reliable and cost-effective method of molecular outbreaks investigations of C. auris.


Introduction 52
Candida auris is a pathogenic yeast which was first isolated in Korea in 1996 and first reported in 53 2009 in a Japanese patient who had a C. auris infection of the external ear canal.(1,2) In the following 54 decade this yeast was found in locations all over the world, including South-Africa, India,Pakistan,55 Kuwait,Venezuela,, suggesting its emergence might be a consequence of climate 56 change.(9) Fungemia and wound infections are the most common clinical entities due to C. auris, 57 which could subsequently lead to infection of other organs like heart and brain.(10) The mortality 58 rate in patients with C. auris infections, which often already experience severe disease, reaches levels 59 of 30-60%.(10, 11) A serious complication in the treatment of patients is the resistance of C. auris for 60 multiple antifungal agents, such as fluconazole and voriconazole.(12) Most C. auris isolates are 61 sensitive to echinocandins, although around 5% of isolates are reported to be resistant to this class of 62 antifungals. (12)  63 Besides the high pathogenicity and the multiresistance, C. auris is also highly contagious. It colonizes 64 the nose, axilla, groin and rectum, and is frequently found on inanimate surfaces and reusable 65 equipment in healthcare facilities, which are potential sources of transmission among hospitalized 66 patients. (13)(14)(15) It is challenging to clean contaminated surfaces and equipment, as C. auris can form 67 biofilms, that are relatively insensitive to hydrogen peroxide and chlorhexidine.(16) Due to its high 68 degree of infectivity and relative insensitivity to standard cleaning protocols, C. auris has caused 69 outbreaks in various healthcare institutions, especially in intensive care settings. The identification of C. auris in a routine microbiology laboratory is difficult. As it is often 77 misidentified as another Candida spp., like C. haemulonii, C. sake, C. famata, C. lusitaniae and C. 78 parapsilosis or as Cryptococcus laurentii or Rhodotorula glutinis, the exact burden of C. auris 79 outbreaks remains challenging.(19-21) A specific, rapid, accurate, reproducible and easy typing 80 method is essential to determine the presence of a potential outbreak, however so far such method 81 is not yet available for C. auris. In this study we developed a short tandem repeat (STR)-based typing 82 for C. auris and used it to type a global collection of isolates. 83 84

Selection of STR markers 86
Tandem repeats in the C. auris genome were identified using genomic information of four isolates 87 originating from different clades. Then 23 tandem repeats of two, three, six or nine nucleotides were 88 selected, which shared at least three repeats with an identical unit length. In order to map the 200-89 300 bases flanking the tandem repeats at both sides, primers were designed (Supplementary Table  90 S1) and applied in PCR amplification using ten isolates from 4 known clades. PCR products were 91 found for 22 tandem repeats and these were sequenced. One of the tandem repeats was not present 92 in all clades, while the flanking sequence of five tandem repeats harbored deletions/insertions close 93 to the tandem repeat, making them unsuitable for STR analysis (Supplementary Table S1). After 94 excluding two repeats which showed very little variability in copy number between the isolates 95 (Supplementary Table S1), a total of 14 tandem repeats remained. As this selection only included 96 two tandem repeats with a length of six nucleotides, these were also excluded, leaving three 97 dinucleotide, six trinucleotide and three nona-nucleotide repeats. 98 99 Development of C. auris STR typing assay and its application on a global collection of isolates 100 To develop a STR typing assay for C. auris, primers were designed in close proximity to the tandem 101 repeat. After testing and optimizing these primers, they were coupled to fluorescent probes. The 102 four multiplex PCRs (M2, M3-I, M3-II and M9) were then used to genotype a C. auris collection of 444 103 isolates. All isolates were successfully typed using these multiplex panels, with the exception of 104 isolates from Korea and Japan, which required monoplex typing of the M3-II panel. Most repeats,105 with the exception of the nona-nucleotide repeats, demonstrated multiple stutter peaks, due to 106 established PCR artefacts.(22) An overview of repeat characteristics, number of alleles and 107 discriminatory indices is shown in Table 1. Among 444 C. auris isolates, 40 different genotypes were 108 identified containing 1 to 125 isolates (Fig. 1A). The genotypes clustered in 5 different groups, 109 previously identified via WGS as the 5 different C. auris clades.(17, 23) These 5 clades were 110 differentiated by at least 8 to 10 STR markers. Less variation was found within the different clades, as 111 the maximal number of different STR markers between isolates within the South Indian clade was 7, 112 while within the other C. auris clades, excluding Iran, isolates differed in maximally 3 STR markers. 113 The total number of different alleles for the 3 markers in the M2 panel was 5, while there were 114 respectively 6-7 and 3-4 alleles in the M3-II and M9 panel. M3-Ib and M3-Ic exhibited respectively 8 115 and 9 alleles, while there were 20 alleles for STR marker M3-Ia. To visualize the genotypes and the 116 country of origin, all individual isolates are shown in a minimum spanning tree (Fig. 1B), which 117 demonstrated that some genotypes were found in different countries. Remarkably, one of the South-118 African isolates localized in the South-American clade. Whole genome sequencing confirmed the 119 overlap of this isolate with the South-American isolates (accession number comes here; soon 120 available). 121 122 Reproducibility and specificity of C. auris STR 123 To test reproducibility, isolates GMR-OM028 and VPCI 213/P/15 were independently amplified for 5 124 times in quadruple. STR typing demonstrated identical results for both isolates for all STR markers, 125 demonstrating that the method is highly reproducible. In order to determine the specificity of our 126 STR assay for C. auris we analyzed the following yeasts for all 12 markers: C. haemulonii, C. 127 pseudohaemulonii,C. duobushaemulonii,C. tropicalis,C. dubliniensis,C. albicans,C. glabrata,C. 128 glaebosa,C. krusei,C. lusitaniae,C. parapsilosis,C. sake,Cryptococcus gattii,Cryptococcus albidus 129 and Rhodotorula glutinis. Products were only found with C. duobushaemulonii using the M3-IIb and 130 M3-IIc marker and with C. pseudohaemulonii using the M2c marker, demonstrating that in general 131 the markers are highly specific for C. auris. 132 133

Discussion 134
The present study describes the development of a novel STR genotyping analysis for C. auris. This C. 135 auris specific STR assay consists of 4 multiplex PCR reactions, which amplify 12 STR targets with a 136 repeat size of 2, 3 or 9 nucleotides (panels M2, M3-I, M3-II and M9). Notably, analysis of the STR 137 genotyping using 444 C. auris isolates from various geographical regions showed that this novel 138 technique is highly reproducible and specific for C. auris. Furthermore, STR typing yielded highly 139 concordant results with WGS and yielded 5 similar distinct groups, which correspond with the four 140 well-known clades from South America, South Africa, South Asia and East Asia, and the possible new 141 clade from Iran. 16,17,31 The allelic variations in the 444 isolates resulted in 40 different genotypes. This 142 relatively low number of different genotypes is partly due to the fact that >95% of our isolates 143 originated from hospital outbreaks, leading to the inclusion of many clonal strains. Furthermore, it is 144 known that C. auris only recently emerged and that there is still little variation between isolates from 145 the same clade, as also shown by WGS. (18)  146 In order to get more insight in the utility of this new STR assay to type strains in an outbreak of C. isolates were differentiated by 19 SNPs with WGS and one copy number in two STR markers. 168 Altogether, isolates that differed few SNPs (<20) via WGS and are labelled as almost indistinguishable 169 are often also not differentiated with STR analysis, while most isolates that differed in 30 or more 170 SNPs are differentiated by STR analysis in one or more STR markers. 171 Implementation of a typing method in an outbreak setting requires establishing cut-off values to 172 determine the potential relatedness of isolates. As the mutation rate between microorganisms 173 strongly differs, such cut-off value should be determined for each microorganism separately. (24)  174 Furthermore, methods should be standardized to determine the precise genetic variation during 175 clonal outbreaks. This standardization is still lacking for the WGS analysis on C. auris. Different 176 methods are currently used, which cannot be directly compared. This is apparent when the numbers 177 of SNPs between identical isolates differ around 10-fold in two different labs.(18, 25) STR analysis is a 178 standardized technique specifically developed for a single microorganism. To establish a cut-off value 179 to determine the relatedness between isolates, we analyzed the variation between several hospital 180 outbreaks included in this study. The London, UK 2015-2016 outbreak showed that all but one 181 isolate exhibited a single genotype. (25) The difference of 4 copy numbers was observed in marker 182 M3-Ia, suggesting that small variations (copy number <5) in STR marker M3-Ia may not be used to 183 regard strains as non-related. Analysis of the outbreak in a Spanish hospital (26)  there were a few single isolates with other genotypes, caused by one copy number in one marker. 193 Finally, most isolates from Barranquilla, which originated from one hospital and were isolated 194 between April 2015 and January 2019, clustered in two larger groups, while a few isolates exhibited a 195 different genotype. Also, these isolates differed only in one copy number of one marker with the 196 exception of isolate C72900 (date of isolation 28-12-2018), which exhibited 13 repeats for M3-I, 197 while the other isolates had 18 or 19 repeats. Thus, the variation between the Colombian isolates 198 was minimal, with exception of isolate C72900, found at the very end of the outbreak, which might 199 be an independent introduction. Altogether, the cut-off for relatedness for the C. auris STR remains 200 to be established, although small variations (copy number <5) in STR marker M3-Ia or one copy 201 number in a M2 or other M3 STR marker should likely not be used to label strains as non-related. Our 202 data suggests that isolates are not related with larger variation in STR markers. 203 The reproducibility of this STR typing assay for C. auris was analyzed by independent amplification of 204 two isolates for 5 times in quadruple. As no differences were found between the typing results of 205 WGS and STR, the novel STR assay is considered as highly reproducible. In addition to reproducibility, 206 the specificity was also tested on different Candida species, and other yeasts close to C. auris and in 207 the majority of targets no amplification was found. In summary we identified a STR typing assay for 208 C. auris which is reliable, reproducible, specific, and cost-effective, and can be used to quickly 209 characterize C. auris isolates during an outbreak. 210 211

Materials and Methods 212
Identification STR loci 213 Genome scaffolds of B8441-Pakistan, B11220-Japan, B11221-South Africa and B11243-214 Venezuela were downloaded from NCBI.(17) Scaffolds from each isolate were combined in one fasta 215 file (www.bioinformatics.org/sms2/combine_fasta.html) and this fasta file was uploaded in tandem 216 repeat finder (www.tandem.bu.edu/trf/trf.html) using the basic search option. From the resulting list 217 of STRs, those repeats that contained insertions or deletions, exhibited <90% perfect match of the 218 repeat sequence, did not vary in copy number between the isolates or contained repeat sequences 219 within the potential PCR primer regions were excluded. From this list only STRs were selected from 220 which there were at least three with an identical unit length. with 2 x 150 bp paired-end read mode at Eurofins Genomics (Ebersberg, Germany). Seventy-four C. 251 auris WGS sequences from NCBI were added to the analysis. FastQC and PRINSEQ was used to assess 252 quality of read data and perform read filtering. Read data were aligned against a publically available 253 genome sequenced on PacBio RS II using BWA. SNP variants were identified using SAMtools and 254 filtered using the publically available SNP analysis pipeline NASP to remove positions that had less 255 than 10x coverage, less than 90% variant allele calls, or that were identified by Nucmer as being 256 within duplicated regions in the reference. Phylogenetic analysis and bootstrapping with 1000 257 iterations were performed on SNP matrices using RAxML. 258 259

Isolates. 260
For the STR analysis 444 C. auris isolates from Austria (27) The specificity of the STR was tested on the spectra of yeast isolates including those that are 268 previously known to yield misidentification of C. auris by commercial biochemical methods. The     P  C  R  p  a  n  e  l  a  n  d  p  r  i  m  e  r  n  a  m  e   F  o  r  w  a  r  d  p  r  i  m  e  r  s  e  q  u  e  n  c  e  (  5  '  -3  '  )  R  e  v  e  r  s  e  p  r  i  m  e  r  s  e  q  u  e  n  c  e  (  5  '  -3  '  )   C  o  n  c  e  n  t  r  a  t  i  o  n  *  (  p  m  o  l  /  µ  l  )   P  r  i  m  e  r  f  l  a  n  k  i  n  g  s  e  q  (  #  b  a  s  e  s  )   R  e  p  e  a  t  u  n  i  t   N  u  m  b  e  r  o  f  r  e  p  e  a  t  s  N  u  m  b  e  r  o  f  a  l  l  e  l  e  s   D   v  a  l  u  e  M  i  n  i  m  u  m  M  a  x  i  m  u  m  R  e  f   #   M  2  M  2  a  F  A  M  -G  C  A  A  C  A  T  C  C  T  G  A  G  C  A  G  T  A  T  C  A  C  G  G  T  G  T  T  G  A  C  G  T  G  C  C  C  A  A  A  T  A  T  G  C  8  1  6  8  A  G  2  4  8  0  6  6  5  0  .  5  8  M  2  b  J  O  E  -C  C  A  C  T  C  C  G  T  T  T  T  G  G  G  T  C  T  G  A  G  A  G  A  A  T  C  T  A  C  A  A  A  T  G  T  G  T  C  G  C  3  6  7  A  G  9  3  0  1  9  5  0  .  6  9  M  2  c  T  A  M  R  A  -C  T  G  T  T  T  C  T  G  T  G  G  C  A  G  G  C  T  T  C  C  G  C  C  A  C  G  T  T  T  C  A  C  Y  G  C  Y  A  C  C  A  T  2  9  0  A  G  8  2  5  9  5  0  .  6  8   M  3  -I  M  3 - T  T  G  T  G  T  A  T  T  C  C  T  A  A  C  A  G  A  G  G  A  T  T  T  C  A  A  T  T  G  C  C  3  1  2  4  T  T  A  1  4  5  2  1  8  9  0  .  6  9   M  3  -I  I  M  3  -I  I  a  F  A  M  -G  T  T  C  A  A  A  A  T  C  G  C  T  G  A  C  G  G  T  C  G  A  G  A  T  G  A  T  G  A  T  G  G  C  A  C  T  T  G  C  8  1  0  1  C  T  A  2  4  4  2  3  6  6  0  .  6  0  M  3  -I  I