Epstein Barr virus epitope/MHC interaction combined with convergent recombination drive selection of diverse T cell receptor α and β repertoires

Recognition modes of individual T cell receptors (TCR) are well studied, but factors driving the selection of TCR repertoires from primary through persistent human virus infections are less well understood. Using deep sequencing, we demonstrate a high degree of diversity of EBV-specific clonotypes in acute infectious mononucleosis. Only 9% of unique clonotypes detected in AIM persisted into convalescence; the majority (91%) of unique clonotypes detected in AIM were not detected in convalescence and were seeming replaced by equally diverse “de-novo” clonotypes. The persistent clonotypes had a greater probability of being generated than non-persistent due to convergence recombination of multiple nucleotide sequences to encode the same amino acid sequence, as well as the use of shorter CDR3 regions with fewer nucleotide additions (i.e. sequences closer to germline). Moreover, the two most immunodominant HLA-A2-restricted EBV epitopes, BRLF1109 and BMLF1280, show highly distinct antigen-specific public (i.e. shared between individuals) features. In fact, TCRα CDR3 motifs played a dominant role, while TCRβ played a minimal role, in the selection of TCR repertoire to an immunodominant EBV epitope, BRLF1. This contrasts with the majority of previously reported repertoires, which appear to be selected either on TCRβ CDR3 interactions with peptide/MHC or in combination with TCRα CDR3. Understanding of how TCR/peptide/MHC complex interactions drive repertoire selection can be used to develop optimal strategies for vaccine design or generation of appropriate adoptive immunotherapies for viral infections in transplant settings or for cancer. Importance Several lines of evidence suggest that TCRα and β repertoires play a role in disease outcomes and treatment strategies during viral infections in transplant patients, and in cancer and autoimmune disease therapy. Our data suggests that it is essential that we understand the basic principles of how to drive optimum repertoires for both TCR chains, α and β. We address this important issue by characterizing the CD8 TCR repertoire to a common persistent human viral infection (EBV), which is controlled by appropriate CD8 T cell responses. The ultimate goal would be to determine if the individuals who are infected asymptomatically develop a different TCR repertoire than those that develop the immunopathology of AIM. Here, we begin by doing an in depth characterization of both CD8 T cell TCRα and β repertoires to two immunodominant EBV epitopes over the course of AIM identifying potential factors that may be driving their selection.

9% of unique clonotypes detected in AIM persisted into convalescence; the majority 38 (91%) of unique clonotypes detected in AIM were not detected in convalescence and 39 were seeming replaced by equally diverse "de-novo" clonotypes. The persistent 40 clonotypes had a greater probability of being generated than non-persistent due to 41 convergence recombination of multiple nucleotide sequences to encode the same 42 amino acid sequence, as well as the use of shorter CDR3 regions with fewer nucleotide 43 additions (i.e. sequences closer to germline). Moreover, the two most immunodominant 44 HLA-A2-restricted EBV epitopes, BRLF1 109 and BMLF1 280 , show highly distinct antigen-45 specific public (i.e. shared between individuals) features. In fact, TCR CDR3 motifs 46 played a dominant role, while TCRplayed a minimal role, in the selection of TCR 47 Introduction that TCR repertoires of CD8 T cell responses to common viruses (influenza, 115 cytomegalovirus, hepatitis C virus) are highly diverse and individualized (i.e. "private") 116 but "public" clonotypes (defined as the same V, J, or CDR3 aa sequences in many 117 individuals) are favored for expansion, likely due to selection for optimal structural 118 interactions (34). 119 Studies of influenza A virus in mice (35) and SIV in rhesus macaques (36) have 120 shown that the efficiency with which TCR sequences are produced via V(D)J 121 recombination is an important determinant of the extent of TCR sharing between 122 individuals (35, 37). Shared TCR amino acid sequences required fewer nucleotide 123 additions and were encoded by a greater variety of nucleotide sequences (i.e. 124 convergent recombination). Both of these features are characteristics of TCR 125 sequences that have the potential to be produced frequently (35)(36)(37)(38)(39) and are also 126 observed in many public TCRs (29,30,[38][39][40][41]. 127 To thoroughly evaluate molecular features of TCR that are important for driving 128 repertoire selection over time following EBV infection, we used direct ex vivo deep 129 sequencing of both TCR Vα and Vβ regions of CD8 T cells specific to two 130 immunodominant epitopes, BRLF-1 109 (YVL-BR) and BMLF-1 280 (GLC-BM), isolated 131 from peripheral blood during primary EBV infection (AIM) and 6 months later in 132 convalescence (CONV). Each TCR repertoire had a high degree of diversity. However, 133 we noted that persistent clonotypes accounted for only 9% of the unique clonotypes, yet 134 they predominated in both the acute and convalescent phases of infection. An 135 interesting corollary of this finding was that 91% of the unique clonotypes expanded in 136 acute infection were not expanded in convalescence, appearing to be replaced in 6 137 months by an equally diverse set of de novo clonotypes. Expanded clonotypes 138 detected in AIM and CONV, were more likely to be generated in part as a result of 139 convergent recombination than non-persistent or de novo clonotypes and had distinct 140 public features (meaning they are shared between donors), which varied by the specific 141 epitope. 142 Results 143

Patient characteristics. 144
Three HLA-A*02:01+ individuals presenting with symptoms of AIM and 145 laboratory studies consistent with primary infection were studied (Table S1) at initial 146 clinical presentation (AIM) and 6 months later (CONV). Direct tetramer staining of 147 peripheral blood revealed that 2.1%±0.5 (mean+SEM) and 1.1%±0.3 of CD8 T cells 148 were YVL-BR and GLC-BM-specific, respectively, in AIM and declined to 0.3%±0.2 and 149 0.3%±0.1, in CONV. Mean blood EBV load was 3.8±0.9 log 10 in AIM and 2.6±0.7 log 10 150 genome copies/10 6 B cells in CONV. To examine features that drive selection of YVL-BR and GLC-BM-specific TCRs 155 in AIM and CONV, deep sequencing of TCR and  repertoires was conducted directly 156 ex vivo on tetramer-sorted CD8 T cells at both time points (Fig 1, S1-2, Table S2). YVL-157 BR-and GLC-BM-specific CD8 TCR repertoires in AIM demonstrated inter-individual 158 differences, and were highly diverse; the mean (±SEM) number of unique clonotypes 159 (defined as a unique DNA rearrangement), were not significantly different in CONV (Fig  160  1). Each unique TCRα or TCRβ clonotype detected in AIM that was also detected in 161 CONV was defined as a "persistent" clonotype. Clonotypes were regarded as "non-162 persistent" or "de novo" if they were detected only during AIM or CONV, respectively. A 163 high level of TCR diversity was maintained from AIM to CONV; however, the number of 164 overlapping unique clonotypes detected in both AIM and CONV was small (Fig 1Ai, Bi). 165 Only a small fraction of TCR orunique clonotypes specific to YVL-BR (6.6±2.2%) 166 and GLC-BM (9.1±4.2%) that were present in AIM were maintained in CONV (YVL-167 BR:8.7±4.9%; GLC-BM:18.5±5.6%). However, they comprised 57.5±26.2% (YVL-BR) or 168 75.5±12% (GLC-BM) of the total CD8 T cell response when including their frequency 169 (sequence reads) in AIM and 35.8±10.2% (YVL-BR) or 55.8±13.4% (GLC-BM) in CONV 170 (Fig 1Aii,Bii). While the clonotypic composition of YVL-BR and GLC-BM-specific CD8 T 171 cells changed over the course of primary infection, dominant TCR clonotypes detected 172 during AIM tended to persist and dominate in CONV. Altogether, these data indicate 173 that persistent clonotypes made up only a small percentage of unique clonotypes but 174 were highly expanded in AIM and CONV. Surprisingly, the vast majority (91%) of unique 175 clonotypes were not detected following AIM and were seemingly replaced with de novo 176 clonotypes in CONV. In both the YVL-BR and GLC-BM TCR repertoires percentage of public 181 clonotypes significantly increased (Chi square: p<0.0001) in the persistent (YVL-BR: 182 TCRAV 34%, TCRBV 17%; GLC-BM: TCRAV 27%, TCRBV 22%) as compared to the 183 non-persistent (YVL-BR: TCRAV 5%, TCRBV 2%; GLC-BM: TCRAV 4%, TCRBV 4%) 184 or de novo repertoires (YVL-BR: TCRAV 5%, TCRBV 1%; GLC-BM: TCRAV 6%, 185 TCRBV 7%). This suggests that the persistent clonotypes may have TCR features that 186 led to greater probability of generation. We tested this by directly calculating the 187 generation probability of amino acid sequences in the CDR3 to determine if the public 188 clonotypes are easier to generate than the private at both time points, acute and 189 convalescent. This allowed a direct and rigorously quantitative test of whether the 190 expanded persistent public clonotypes were of higher generation probability (39,42). 191 The TCR sequences used by dominant public TCRAV of either  specific responses have a significantly greater probability of generation while only the 193 GLC-BM TCRBV public but not the YVL-BR public repertoire has a greater probability of 194 being generated (Fig 1A,B). This might suggest that TCRAV is dominant and important 195 in the selection of YVL-BR TCR repertoire, while both TCRAV and TCRBV contribute to 196 the GLC-BM TCR repertoire. 197 To further study this issue we examined whether convergent recombination 198 played a role in the generation of these public persistent TCR (39). Examination of 199 memory antiviral TCR repertoires in humans, mice, and macaques suggests that 200 convergent recombination plays an important role in the selection of public antigen-201 specific TCR (i.e. those shared between individuals of the same haplotype (35)(36)(37). 202 Consistent with previous reports for epitope-specific CD8 TCR(37, 43, 44) our group 203 found that convergent recombination plays an important role in EBV-specific TCR 204 repertoire selection. We also demonstrated that convergent recombination plays a role 205 in selection of persistent TCR clonotypes specific for the two immunodominant EBV 206 epitopes, YVL-BR and GLC-BM during the course of a human viral infection. There was 207 an increased usage of amino acids derived by multiple different nucleotide (nt) 208 sequences in the CDR3 and  regions of persistent clonotypes as compared to non-209 persistent and de novo clonotypes (Fig 3A,C). In fact, we show here that the public 210 TCR had significantly greater usage of these types of amino acids in the CDR3, as 211 well as the CDR3 (Fig 3B,D), as compared to the private clonotypes. 212 Another TCR feature that leads to increased probability of generation is the use 213 of decreased numbers of nucleotide additions in the CDR3, consistent with encoding of 214 the TCR by predominantly germline gene segments (39). This was indeed the case for 215 YVL-BR and GLC-BM-specific clonotypes (Fig 4); the CDR3 of persistent YVL-BR-216 and GLC-BM-specific clonotypes had fewer nucleotide additions compared to non-217 persistent and increased number of nt additions in de novo clonotypes of EBV-BR. 218 However, the CDR3 of persistent YVL-BR-and GLC-BM-specific clonotypes did not 219 have fewer nucleotide additions compared to non-persistent (Fig 4A,D). Public 220 clonotypes of each epitope-specific response also had fewer nucleotide additions than 221 private clonotypes except interestingly for YVL-BR CDR3 where the private had fewer 222 (Fig 4 B,E). Interestingly, there was an increased usage of glycines in the longer CDR3 223 of the de novo TCR repertoire (Fig 4 C,F), which has been reported to be a feature 224 associated with greater TCR promiscuity (45,46). Overall, these results suggest the use 225 of shorter CDR3 regions with fewer nucleotide additions in the persistent TCRAV but 226 not in the TCRBV clonotypes. Curiously, consistent with probability generation data (Fig  227   2) the public TCRBV of EBV-BR were actually significantly longer with increased 228 nucleotide additions. 229

CDR3 lengths are a major factor in the selection of the YVL-BR-and GLC-BM-230 specific TCR and repertoires 231
Differences in dominant YVL-BR-and GLC-BM-specific CDR3 and  lengths 232 were also observed between the epitopes and from AIM to CONV and between 233 persistent and non-persistent or de novo clonotypes (Fig 5). There were differences in 234 preferential use of CDR3 lengths between YVL-BR and GLC-BM. For instance, the AIM 235 YVL-BR-specific repertoire used more of the shorter 10-mer CDR3β than GLC-BM in 236 both AIM and CONV (Fig 5Aii). Within the YVL-BR response, use of the shorter 9-mer 237 CDR3α decreased from AIM to CONV (Fig 5Ai).
Persistent YVL-BR-specific 238 clonotypes used significantly more of the shorter 9-mer CDR3 and 10-, 11-, and 12-239 mer CDR3 than the non-persistent. In contrast, the de novo clonotypes favored the 240 longer 12-mer CDR3 and focused more on 11-mer CDR3 length (Fig 5Bi-ii). 241 Significant changes in the GLC-BM-specific CDR3 length were also observed between 242 AIM and CONV. For example, the frequencies of the longer GLC-BM-specific 12-mer 243 CDR3α and β clonotypes significantly increased from 13.6±6% and 6±2.8%, 244 respectively, in AIM to 24±5% and 17.9±8%, respectively, in CONV, while use of the 245 shorter 11-mer CDR3α decreased (Fig 5Ai-ii). The persistent clonotypes preferentially 246 used 9-and 11-mer CDR3 while de novo used longer 12-and 14-mer lengths (Fig  247   5Biii-iv). The persistent clonotypes also used 11-and 13-mer CDR3, while de novo 248 used 12-mer lengths. 249 Selection of the TCRand repertoires was based on the features on the specific 250 epitope 251 To further elucidate factors that are driving selection of TCR specific to the two 252 immunodominant EBV epitopes, the characteristics of the TCR repertoires for each of 3 253 donors were elucidated by systematically analyzing preferential TCRAV or BV segment 254 usage hierarchy as presented in pie charts, CDR3 length analyses, V-J pairing by circos 255 plots of the clonotypes with the dominant CDR3 lengths, and dominant CDR3 motif; the 256 latter determines if there was an enrichment of particular amino acid residues at specific 257 sites potentially important for ligand interaction. Enrichment for certain characteristics 258 would suggest that these features are important for pMHC interaction. (11,29,(47)(48)(49)(50). 259

cells:
The YVL-BR-specific TCR repertoire was focused on one dominant family, AV8, 261 used by all donors in AIM and CONV (Fig 6Ai, S1Ai). Similar strong selection bias was 262 not observed in YVL-BR-specific TCRBV usage; there was a great deal of inter-263 individual variation and preferential usage of multiple families, including BV6, BV20, 264 BV28, BV29 (Fig 6Bi, S1Bi). Interestingly, in CONV, some TCRAV and BV gene 265 families that dominated in AIM became extinct or subdominant, or new dominant genes 266 emerged (Fig 6Ai, Bi). 267 Circos plot analyses of the pronounced 9-mer clonotypes showed that the 268 dominant AV8.1 gene almost exclusively paired with AJ34 (Fig 6A, S1Aiii). CDR3 269 motif analysis revealed a pronounced motif, "VKDTDK", in these shorter 9-mer 270 clonotypes, representing 13.8%±5.6 of the total CD8 T cell response during acute AIM 271 (Fig 6Aiii, S1Aiv, Table S3A); 87%±1.7 of the clonotypes using this motif were AV8.1 272 and 92%±1.7 were AJ34. Interestingly, this motif was present in multiple other AV and 273 AJ pairs, including AV12, AV21 and AV3. Obligate pairing of the dominant AV8.1 274 response to AJ34 containing the highly conserved motif, VKDTDK, was observed in all 275 donors from AIM through CONV, suggesting that the 9-mer AV8.1-VKDTDK-AJ34 276 expressing clones were highly selected. There was a preferential usage of BV20-BJ2.7 277 pairing within the dominant 11-mer response (Fig 6Bii, S1Biii), without an obvious 278 CDR3 motif (Fig 6Biii, S1Biv), highlighting a great degree of diversity in the amino 279 acid sequences. Within the 13-mer response (Fig 6Biii,S1Biv, Table S3B), the CDR3 280 motif, "LLGG", was commonly used. Clonotypes with this motif were only a minor part of 281 the overall responses in 2 donors (E1603, E1655), but composed 17.4% of the total 282 YVL-BR TCRrepertoire in E1632. 283 Altogether, these results suggest that the 9-mer AV8.1-VKDTDK-AJ34 284 expressing clones were highly preferentially selected by YVL-BR ligand during AIM and 285 CONV and that this TCR could pair with multiple different TCR, as suggested by the 286 fact that there was no such dominant TCR clonotype. These findings have been 287 independently confirmed using single cell sequencing (51). 288

Persistent, non-persistent, and de novo clonotypes differ in selection factors. 316
To address whether clonotypes that persisted into memory show similar 317 characteristics to those that dominate in acute infection, YVL-BR and GLC-BM TCRα/β 318 repertoires were compared between AIM and CONV. The TCR repertoire of persistent 319 and non-persistent clonotypes in AIM, and de novo clonotypes in CONV, were 320 examined in order to identify selection factors that governed TCR persistence. 321

YVL-BR persistent, non-persistent, and de novo clonotypes have unique 322
characteristics. Persistent YVL-BR clonotypes maintained the major selection factors 323 that were identified in AIM (Fig S3,S4, 8A, Table S4). Although some features were 324 maintained in all 3 TCR subsets, there were significant structural differences in these 325

repertoires. 326
The YVL-BR non-persistent CDR3 clonotypes used AV8.1 but it was paired with 327 many more AJ gene families (Fig S3). Moreover, AV8.1-VKDTDK-AJ34 clonotypes, 328 which were present in 42±20% or 19±11% of all persistent clonotypes during AIM or 329 CONV, respectively, were present in the non-persistent response at a much lower mean 330 frequency (6±1%; Fig 8A, Table S4A,B). The clonal composition of the CDR3 non-331 persistent response varied greatly in BV family usage between donors (Table S4D,E) 332 and lacked identifiable motifs, suggesting that for YVL clones expressing AV8.1-333 VKDTDK-AJ34 to persist, there may be some preferential if not obvious TCR 334 characteristics that make them better fit. 335 For de novo clonotypes, new selection factors appeared that may relate to either 336 a decrease in antigen expression or a change in antigen-expressing cells over the 337 course of persistent infection. For instance, in the YVL-BR 9-mer de novo clonotypes, 338 the selection factor AV8.1-AJ34 was maintained in 2/3 donors and a new modified motif, 339 VKNTDK was identified (Fig S3Ai, 8A, Table S4C). The de novo 11-mer CDR3 340 response had increased usage of AV12 in all 3 donors (Fig S3Aii). In de novo BV 341 clonotypes, the pattern of BV-BJ usage changed compared to that observed in AIM. 342 Similarly, de novo 13-mer CDR3 clonotypes were also totally different with usage of a 343 new motif, SALLGX, in 2/3 donors (Table S4F). 344

GLC-BM persistent, non-persistent and de novo clonotypes have unique 345
characteristics. The persistent GLC-BM TCR clonotypes maintained the major 346 selection criteria that were identified in AIM with the 9-mer EDNNA motif, which strongly 347 associated with AV5-1-AJ31, being present in a mean 5±3.7% or 10±8.6% of all 348 persistent clonotypes during AIM or CONV, respectively, in all 3 donors (Fig 8B, Table  349 S4G). The fact that clonotypes using this motif were not present in non-persistent 350 clonotypes suggests that this motif, and not just the gene family, may be important in 351 determining persistence of GLC-BM-specific clonotypes. The persistent GLC-BM-352 repertoire also maintained the major selection criteria that were identified in AIM, with 353 the 11-mer SARD motif that strongly associated with BV20.1-BJ1 being present in a 354 mean 16±9.9% or 24±13.7% of all persistent clonotypes during AIM or CONV, 355 respectively in all 3 donors. Two of the donors had the 11-mer SQSPGG motif (Table  356 S4I) in a mean 40±8% and 30±25% of all persistent clonotypes during AIM or CONV, 357

respectively. 358
Only the SARD motif clonotypes appeared in non-persistent BV clonotypes 359 during AIM but at a lower mean frequency of 3±1% ( Table S4J). The de novo clonotype 360 selection appeared to be driven by different factors than the persistent. Although there 361 was much greater diversity and more variation between patients in de novo clonotypes 362 (each donor is private) with recruitment of private AV families such as AV41 or AV24 in 363 E1632 and E1655, there was still a preferential usage by 2/3 donors of AV5.1 (Fig S5i) 364 and the appearance in 2/3 donors of a new 11-mer CDR3 motif "ELDGQ", which 365 associated with AV5.1-AJ16.1 (Fig 8B, Table S4H). De novo clonotypes were also 366 diverse and private using uncommon BV like BV7, BV3 but also using common BV 367 families such as BV20 (Fig S6) expressing the SARD motif in 5%±2.9 of de novo 368 clonotypes (Fig 8B, Table S4K). 369 In conclusion, the persistent clonotypes made up the vast majority of the AIM and 370 CONV responses. For the most part, the non-persistent clonotypes did not have a motif 371 despite the observation that some of them used a public TCR or ; this suggests that 372 one of the strongest selection factors for persistence was the CDR3 motif. Additionally, 373 the fact that persistent clonotypes retained features that were identified in AIM further 374 supports their validity. Altogether, these results suggest that the HLA-A2-YVL-BR-or 375 GLC-BM-specific structure contributes strongly to the selection of dominant persistent 376

Discussion 378
This is the first study to use deep sequencing to comprehensively investigate the 379 TCR and  repertoires to two different EBV epitope-specific CD8 T cell responses over 380 the course of primary infection. We show that while epitope-specific TCR repertoires are 381 highly diverse and vary greatly between donors, they are dominated by distinct 382 clonotypes with public features that persist into convalescence. These persistent 383 clonotypes have distinct features specific to each antigen that appear to drive their 384 peripheral selection; they account for only 9% of unique clonotypes, but predominate in 385 acute infection and convalescence, accounting for 57%±4 of the total epitope-specific 386 response. Surprisingly, the majority of highly diverse unique clonotypes were not 387 detected following AIM and are replaced in convalescence by equally diverse "de-novo" 388 clonotypes (43% + 5% of the total response). 389 The deep sequencing results show a highly diverse TCR repertoire in each 390 epitope-specific response with 1,292-15,448 and 1,644-7,631 unique clonotypes 391 detected within the YVL-BR and GLC-BM-specific TCR-repertoires, respectively. Such 392 diversity has been underappreciated for the GLC-BM-specific TCR repertoire, with prior 393 studies reporting an oligoclonal repertoire (52,53,55). Despite this enormous diversity, 394 there was considerable bias. Although the TCR repertoire was individualized (i.e., each 395 donor studied had a unique TCR-repertoire), there was prevalent and public usage of 396 particular TCRV families such as AV8 within the YVL-BR-specific responses and AV5, 397 AV12 and BV14, BV20 within the GLC-BM-specific populations. 398 One mechanism which may lead to the dominant public usage and persistence of 399 these clonotypes is that they have TCR features that increase their probability of 400 generation, i.e. they are potentially easier to derive. One of these features, convergent 401 recombination in both the TCR as well as the TCR CDR3 region appears to play a 402 major role in the selection of these persistent clonotypes for expansion and 403 maintenance into long-term memory. This is evidenced by persistent clonotypes using 404 more amino acids that have multiple ways of being derived. A second feature is the 405 usage of shorter germline-derived CDR3 regions with fewer nucleotide additions. The 406 selection of unique public TCR repertoire features, such as CDR3 length, particular 407 TCRAV or BV family usage and motifs, for each epitope in clonotypes that dominate 408 and persist suggest that these clones may be the best fit TCR to recognize the pertinent 409 pMHC complex. In contrast, the broad repertoire of unique clonotypes that are activated 410 in AIM, which is marked by a high viral load and increased inflammation, may not fit as 411 well and perhaps do not receive a TCR signal that leads to survival into memory. 412 Interestingly, 6 months after the initial infection, a completely new (de novo) and 413 similarly diverse TCR repertoire has expanded. Continued antigenic exposure in 414 persistent EBV infection may contribute to the evolution of the TCR repertoire overtime. 415 Prior studies using similar techniques to study influenza A virus (IAV) (not a 416 persistent virus) HLA-A2-restricted IAV-M1 58-67 and cytomegalovirus (CMV)-pp65 417 epitope-specific memory responses showed a similar focused diversity of epitope-418 specific TCR repertoires, suggesting that this is a general principle of antigen-specific 419 repertoire structure (29, 30). Altogether, these studies suggest that the pMHC structure 420 drives selection of the particular public featured dominant clonotypes for each epitope. 421 The broad fluctuating private repertoires show the resilience of memory repertoires and 422 may lend plasticity to antigen recognition, perhaps assisting in early cross-reactive CD8 423 T cell responses to heterologous new pathogens (28, 56, 57) while at the same time 424 potentially protecting against T cell clonal loss and viral escape (58). 425 It is, however, possible that this difference in the private diverse portion of the 426 epitope-specific TCR repertoire between acute and convalescence may result from 427 sampling error as we are not able to analyze the full blood volume of an individual. In 428 order to at least partially address this we have analyzed TCRAV and BV deep 429 sequencing data from tetramer-sorted influenza A-M1 58 -specific CD8 T cells (not a 430 persistent virus, thus not influencing TCR repertoire evolution) from one healthy donor 431 of a similar age from two time points one year apart. We compared the TCR overlap of 432 this antigen-specific population at two time points to the donors with AIM in the 433 manuscript. We calculated the overlap between clonotypes at two distinct visits (v1 vs. 434 v7) using the Jaccard similarity coefficient J, which is defined as the size of the 435 intersection divided by the size of the union of two sets of clonotypes A and B. The 436 mean Jaccard similarity coefficient for TCRAV including both EBV epitopes during AIM 437 was 0.0750.01 (n=6) and for TCRBV was 0.0750.01 (n=6). A higher Jaccard similarity 438 coefficient was observed in the healthy donor for TCRVA (0.172) and for TCRVB 439 (0.208). The much higher Jaccard coefficients obtained for the healthy donor suggest 440 that the low overlap between clonotypes observed for acute vs. convalescent visits in 441 EBV infected individuals would not be due to sampling alone. Also, the significant 442 differences in the characteristics of the TCR repertoires of the non-persistent and de 443 novo populations would suggest that these are different populations. 444 There have been limited reports of the importance of TCR in viral epitope-445 specific responses. Biased TRAV12.2 usage with CDR1 interaction with the MHC has 446 been observed with the HLA-A2-restricted yellow fever virus epitope, LLWWNGPMAV 447 (59). HLA-B*35:08 restricted EBV BZLF1-specific responses appear to be biased in 448 both TCR and TCR usage, much like HLA-A2-restricted EBV-BR, (60, 61) with a 449 strong preservation of a public TCR clonotype, AV19-CALSGFYNTDKLIF-J34, which 450 can pair with a few different TCRchains. TCRchain motifs have also been described 451 for HLA-A2-restricted influenza A M1 58-67 (IAV-M1), but these appear to make minor 452 contributions to the pMHC-TCR interaction, which is almost completely dominated by 453 CDR3 (29,45,46). 454 The TCR repertoire of the HLA-A2-restricted IAV-M1 epitope is highly biased 455 towards the TRBV19 gene usage in many individuals and displays a strong preservation 456 of a dominant xRSx CDR3 motif. Crystal structures of TCR specific to this epitope 457 have revealed that the TCR is -centric with the conserved arginine in the CDR3 loop 458 being inserted into a pocket formed between the peptide and the 2-helix of the HLA-A2 459 (29, 62). The TCR has little role in pMHC engagement and this helps explain the high 460 degree of the variability in the CDR3 of sequence and conservation in the CDR3 461 region. Similarly, previous studies using EBV-GLC-BM-specific CD8 T cells have 462 documented that TCR-pMHC binding modes also contribute to TCR biases (63). The 463 highly public HLA-A2-restricted EBV-GLC-BM-specific AS01 TCR, is highly selected 464 because of a few very strong interactions of its TRAV5-and TRBV20-encoded CDR3 465 loops with the peptide/MHC.  466 The present TCR deep sequencing studies, thus reinforce our previous report of 467 an under-appreciated role for TCR-driven selection of the EBV-YVL-BR-specific 468 repertoire (Fig 6) (51). To the best of our knowledge, our combined studies are among 469 the first to describe a TCRCDR3-driven selection of viral epitope-specific TCRs with 470 minimal contribution by the TCRBV. The AV8.1 family was used by all individuals and 471 dominated the conserved 9-mer response; it obligately paired with AJ34, and had a 472 predominant CDR3 motif "VKDTDK", representing 42% and 19% of the total persistent 473 response in AIM and CONV, respectively. In contrast, the BV response was highly 474 diverse without evidence of a strong selection factor, suggesting that AV8.1-VKDTDK-475 AJ34 could pair with multiple different BV and still successfully be selected by YVL-BR-476 MHC. In contrast, we did not find any of these AV8.1-VKDTDK-AJ34 expressing TCR in 477 a survey deep sequencing of sorted naïve phenotype CD45RA+, CCR7+ CD8 T cells 478 from 3 age-matched, healthy individuals ( one EBV serologically negative and two EBV 479 serologically positive). These results suggest that this clonotype is not inherently 480 present at a high frequency in the naive repertoire, but requires interaction with EBV-481 YVL-BR to be selected and expanded to these high frequencies. 482 In contrast, the selection of EBV-GLC-BM-specific TCR repertoire was driven by 483 strong interactions with both chains of TCR, and, such as AV5.1-EDNNA-AJ31, 484 BV14-SQSPGG-BJ2 and BV20.1-SARD-BJ1, previously identified public features (43, 485 52, 53, 55). In a recent study comparing TCRand repertoires of various human and 486 murine viral epitopes, none of the responses were primarily driven by interaction with 487 TCR alone; rather they were predominantly driven by strong interactions with TCR or 488 a combination of TCRand (11). This apparent preference of YVL-BR TCR 489 repertoires for particular TCR may create a large repertoire of different memory TCR 490 that could potentially cross-react with other ligands such as IAV-M1 58 , which 491 predominantly interact with TCR (11,27,29). 492 Using single-cell paired TCR sequencing of tetramer sorted CD8 T cells ex 493 vivo, we have previously reported that at the at the clonal level recognition of the HLA-494 A2-restricted EBV-YVL-BR epitope is mainly driven by the TCR chain (51). The 495 CDR3 motif, KDTDKL, resulted from an obligate AV8.1-AJ34 pairing. This observation 496 coupled with the fact that this public AV8.1-KDTDKL-AJ34 TCR pairs with multiple 497 different TCR chains within the same donor (median 4; range: 1-9), suggests that there 498 are some unique structural features of the interaction between the YVL-BR/MHC and 499 the AV8.1-KDTDKL-AJ34 TCR that leads to this high level of selection. TCR motif 500 algorithms identified a lysine at position 1 of the CDR3 motif that is highly conserved 501 and likely important for antigen recognition. Crystal structure analysis of the YVL-502 BR/HLA-A2 complex revealed that the MHC-bound peptide bulges at position 4, 503 exposing a negatively charged aspartic acid that may interact with the positively 504 charged lysine of CDR3. TCR cloning and site-directed mutagenesis of the 505 CDR3lysine ablated EBV-BR-tetramer staining and function. Interestingly, we had 506 previously used TCR structural modeling of the EBV-YVL-BR/MHC complex to predict 507 the occurrence of this important protuberant lysine which might impact TCR interaction 508 (64). Future structural analyses would be important to ascertain whether the YVL-BR 509 TCR contributes the majority of contacts with the pMHC. 510 Altogether, our data provide several insights into potential mechanisms of TCR 511 selection and persistence. First, prior studies have revealed that selective use of 512 particular gene families can be explained in part by the fact that the specificity of TCR 513 for a pMHC complex is determined by contacts made between the germline-encoded 514 regions within a V segment and the MHC (63, 65). We show here a highly unique 515 observation of a viral epitope-specific response being strongly selected based not only 516 on a particular TCRAV usage but a highly dominant CDR3 motif and AV-AJ pairing 517 (i.e., the YVL-BR-specific AV8.1-VKDTDK-AJ34 clonotype), with very little role for the 518 TCRBV. Second, it has been suggested that public TCR represent clonotypes present 519 at high frequency in the naïve precursor pool as they may be easier to generate in part 520 as a result of bias in the recombination machinery (66) or convergent recombination of 521 key contact sites (35,37,43,63). Our data demonstrate that convergent recombination 522 of TCR , as well as TCR, may play a dominant role in peripheral selection of 523 clonotypes that persistently detected through memory. As previously reported for 524 TCR(35,37,43,63), public clonotypes had a greater probability of being generated. 525 They used more convergent amino acids than private clonotypes, not only in the 526 CDR3, but also in the CDR3 YVL-BR TCR which interestingly is not a strong 527 selection factor for persistent clonotypes did not have public clonotypes with features 528 that led to greater probablity of being generated. Finally, we have previously reported 529 that TCR immunodominance patterns also seem to scale with number of specific 530 interactions required between pMHC and TCR (29). It would seem that TCR that find 531 simpler solutions to being generated and to recognizing antigen are easier to evolve 532 and come to dominate the memory pool (29). Consistent with this our data demonstrate 533 that the dominant persistent clonotypes used shorter predominantly germline derived 534

CDR3.  535
Despite the apparent non-persistence of the vast majority of the initial pool of 536 clones deployed during acute infection, clonotypic diversity remained high in memory as 537 a result of the recruitment of a diverse pool of new clonotypes. In a murine model, 538 adoptive transfer of epitope-specific CD8 T cells of known BV families from a single 539 virus-infected mouse to a naive mouse, followed by viral challenge, resulted in altered 540 hierarchy of the clonotypes and the recruitment of new clonotypes, thus maintaining 541 diversity (67). A highly diverse repertoire should allow resilience against loss of 542 individual clonotypes with aging (45) and against skewing of the response after infection 543 with a cross-reactive pathogen (68-71). The large number of clonotypes contributes to 544 the overall memory T cell pool, enhancing the opportunity for protective heterologous 545 immunity now recognized to be an important aspect of immune maturation (56,72,73). 546 A large pool of TCR clonotypes could also provide increased resistance to viral escape 547 mutants common in persistent virus infections (58). Finally, different TCR may activate 548 antigen-specific cell functions differently, leading to a more functionally heterogeneous 549 pool of memory cells (74). 550 In summary, our data reveal that apparent molecular constraints are associated 551 with TCR selection and persistence in the context primary EBV infection. They also 552 show that TCR CDR3 alone can play an equally important role to CDR3 in TCR 553 selection and persistence of important immunodominant responses. Thus, to 554 understand the rules of TCR selection, both TCR and TCR repertoires should be 555 studied. Such studies could elucidate which of the features of the epitope-specific CD8 556 TCR are associated with an effective response and control of EBV replication or 557 disease. 558

Study population 560
Three individuals of the age of 18 (E1603, E1632, E1655) who presented with clinical 561 symptoms consistent with acute infectious mononucleosis (AIM) and laboratory studies 562 indicative of primary infection (positive serum heterophile antibody and EBV viral capsid 563 antigen (VCA)-specific IgM) were studied as described (27). Blood samples were 564 collected in heparinized tubes at clinical presentation with AIM symptoms (acute phase) 565 and six months later (memory phase). PBMC were extracted by Ficoll-Paque density 566 gradient media. 567

Ethics Statement 568
The Institutional Review Board of the University of Massachusetts Medical School 569 approved these studies (IRB protocol #: H-3698). All human subjects were adult and 570 provided written informed consent. 571

Analysis of TCRα and β CDR3 regions using deep sequencing 583
The total RNA isolated from minimum 10,000 tetramer+ CD8 T cells was reversely 584 transcribed into cDNA and sent to Adaptive Biotechnologies for TCRα and β-chain 585 profiling following the protocols and standards for sequencing and error correction that 586 comprise ImmunoSEQ platform. In summary, PCR amplification of the CDR3 region 587 is performed using specialized primers that anneal to the V and J recombination 588 Reactions were run in duplicate. B cell counts in each sample were determined using a 614 previously described PCR assay to quantify the copy number of the gene encoding 615 CCR5 (two copies per diploid cell)(78). Samples were normalized to B cell counts and 616 EBV DNA copy number was calculated as DNA copy per 10 6 B cells.

&s=AdlhcrGwYqZ-QWYlQON5AJFRO88HSQe1qPUMaWRkQik&e= 637
The login information is as follows: repertoire had a greater probability of being generated than the private sequences. This 663 is highly consistent with our observation that TCRAV plays a much greater role in the 664 peripheral selection of the YVL-BR TCR repertoire than does TCRBV. The differences 665 between public and private in each pair are all significant (Wilcoxon test, p<0.0001) 666 except TCRBV BR V1 (acute visit 1) and V7 (CONV visit7) .  Data was analyzed by two-way ANOVA multivariant analysis with correction for multiple 685 comparisons, * p<0.05, ** p<0.01, *** p<0.001, **** p<0.0001. Error bars are SEM. 686 Lilly Company, which operates the facility. 748

Competing interests 749
The authors have no financial conflicts. The contents of this publication are solely the 750 responsibility of the authors and do not represent the official view of the NIH. 751 Author contributions 752 L.K.S. and K.L. obtained samples and conceived the study. A.G. and L.K. contributed to 753 study design, and were primarily responsible for cell sorting and TCR sequencing. All 754 authors contributed to data analyses. D.G. and R.C. performed all computational 755 analyses. L.K.S., K.L., L.K., A.G. and D.G. assumed primary responsibility for writing 756 the manuscript. All authors reviewed, provided substantive input, and approved of the 757 final manuscript.