Dynamic proteomics of HSV1 infection reveals molecular events that govern non-stochastic infection outcomes

Viral infection is usually studied at the level of cell populations, averaging over hundreds of thousands of individual cells. Moreover, measurements are typically done by analyzing a few time points along the infection process. While informative, such measurements are limited in addressing how cell variability affects infection outcome. Here we employ dynamic proteomics to study virus-host interactions, using the human pathogen Herpes Simplex virus 1 as a model. We tracked >50,000 individual cells as they respond to HSV1 infection, allowing us to model infection kinetics and link infection outcome (productive or not) with the cell state at the time of initial infection. We find that single cells differ in their preexisting susceptibility to HSV1, and that this is partially mediated by their cell-cycle position. We also identify specific changes in protein levels and localization in infected cells, attesting to the power of the dynamic proteomics approach for studying virus-host interactions.


INTRODUCTION 28
Viral infection is a heterogeneous process. One example is the variation in the number of viral 29 progeny produced by individual cells, which spans several orders of magnitude, as first described 30 for bacteriophages in the 1940's (Delbrück, 1945). Several recent studies found similar variability of infection (moi) (Parker, 1938;Smith, 1968). 45 In addition, infection outcome might also be influenced from the host cellular state at the time of 46 virus adsorption. Such cell-intrinsic differences include the cell-cycle stage and cell-to-cell 47 variability in protein levels and activities that have been studied in other contexts (Elowitz et al.,48 2002; Cohen et al., 2008;Tay et al., 2010;Loewer and Lahav, 2011;Kellogg and Tay, 2015). later and immuno-stained for viral proteins. Using a machine learning approach, they found that 52 infected cells differ from non-infected cells (Snijder et al., 2009(Snijder et al., , 2012. However, as cells were 53 not imaged at the time of virus adsorption, it is unclear whether the observed differences between 54 the cells are the cause or the consequence of viral infection success. 55 There is a lack of experiments that directly address the question of determinism in viral infection, 56 which requires a system that follows individual cells from the time of virus adsorption to the 57 onset of viral gene expression (distinguishing successful from failed infections). 58 4 infection. This design allowed us to compare successfully-infected and non-infected cells side by 90 side. Note that in this cell line, the moi used is equivalent to ~50 virus particles per cell, such that 91 all cells in the culture have likely encountered viruses. H1299 cells are fully permissive for HSV1 92 infection, as evident by the spread of infection from primary infected cells to produce secondary 93 infections (Fig. 1A, Supplementary Movie 1). 94 Using custom software we tracked tens of thousands of individual cells for 12 hours after HSV1 95 adsorption at a time resolution of 20 minutes, extracting features such as the cell's position, shape 96 and size, as well as the level and sub-cellular localization of the different fluorescent proteins. 97 Since we continuously monitored the cells we could identify the point at which each infected cell 98 began to express CFP. We refer to this time delay between viral adsorption and initial expression 99 of viral encoded proteins as infection lag time. We found that the lag time is variable among 100 individual cells with a mean time of 5.9±2.6 hours and a CV of 44% (Fig. 1B). 101 Since HSV1 undergoes productive infection in the cells, the distribution of lag times captures 102 both primary infections and subsequent secondary infections (Fig. 1A,B). To determine the 103 kinetics of infections we modeled infection kinetics as the sum of primary and secondary 104 infections ( Fig. 1B and Supplementary Fig. 2). We fitted our model with two-parameter 105 distributions (Normal, Log-normal, Weibull and Gamma), estimating for each distribution the 106 best fitted parameters. We found that the lag-time distribution is best described by a Gamma 107 distribution with a shape factor of 6 and a rate parameter of 1.25 (Supplementary Fig. 2 and 108

Methods). 109
In addition to the theoretical interest in modeling infection kinetics, this model also allowed us to 110 determine a cut-off point between primary and secondary infections, which is 9 hours post virus 111 adsorption. Here we are only interested in the analysis of the primary infections, which more 112 closely resembles the initial infection of a human host. In all subsequent analyses, we considered 113 cells as successfully infected if their initial CFP expression time was below 9 hours and as non-114 infected if they remained CFP negative for the entire 12 hours. Cells whose CFP expression time 115 was between 9-12 hours were removed from the analyses. 116 As we started our time-lapse recording from viral adsorption, we could not directly observe the 117 cell's position along the cell-cycle (based on its previous mitosis). To  We trained the classifier on 23,780 cells, using the 20 most explanatory features (Fig. 2B, 138 Supplementary Table 1). The features used include textural features as well as the cell's velocity, 139 mCherry concentration, cell-cycle stage, nuclear area and cell morphology. The classifier outputs 140 the probability of a cell to become infected, ranging from 0-1, and we refer to this output 141 hereafter as the classifier score. We classified cells based on the classifier score, using a threshold 142 of 0.5 to assign cells to the infected or non-infected groups. We tested the classifier performance 143 on a dataset of 8,108 cells from clones not used in the training step and found that it correctly 144 predicted infection outcome in 75% of these cells with an area under the curve (AUC) of 0.82 145 (Fig. 2C). 146 We next analyzed the contribution of the different features in predicting infection outcome. The 147 most predictive feature was the nuclear cluster prominence, a measurement of textural asymmetry 148 (Unser, 1986). Cells with low cluster prominence were more likely to become successfully 149 infected, and the probability for infection decreased as the cluster prominence increased (Fig.  150 2D). 151 6 The second most predictive feature was the cell's velocity at the time of virus adsorption (Fig.  152   2E). While infection probability continues to increase as the cell movement increases, the most 153 pronounced effect was seen in cells with low velocity (z-score<0) which were more resistant to 154 viral infection. Infection probability dependence on the top 10 features can be found in 155 Supplementary Fig. 4. 156 The cell-cycle stage was ranked ninth among these features (Fig. 2B). Infection probability 157 showed a non-monotonic relation to the cell-cycle stage, peaking around six hours after mitosis 158 and then decreasing as the cells progress through the cell cycle (Fig. 2F). We tested whether the 159 cell-cycle effect is independent from that of other features by comparing the mean classifier score 160 of cells that will become infected and those that will not over 24 hours, after aligning them to the 161 same stage of the cell cycle (Fig. 2E). We observed that the classifier score is higher in cells that 162 will become infected throughout the cell-cycle, implying that the cell-cycle effect is at least 163 partially independent from that of other features, such as the cell velocity and texture that were 164 described above. 165 The success of a machine learning algorithm in predicting the success of viral infection in 166 individual cells suggests that the outcome of infection is not intrinsically stochastic but rather 167 depends on the cellular state at the time of infection. 168

Cellular susceptibility is pre-existing in the population prior to encountering the virus 169
Since our time-lapse recordings started approximately 45 minutes after HSV1 adsorption to the 170 cells, we wanted to verify that the classifier was not influenced by a rapid response of the cells to 171 the infection. To address this, we performed longer time-lapse movies in which we first imaged 172 the cells unperturbed for 24 hours before adding the virus. We tracked 124 infected cells and 99 173

non-infected cells. 174
We analyzed the classifier's performance when using cell images up to 24 hours prior to the 175 addition of the virus. We did so by using either the raw score given by the classifier (Fig. 2H  176 ,purple line), or after normalizing according to the cell-cycle effect, as the cells move along the 177 cell-cycle phases during these 24 hours (Fig. 2H ,orange line). Both analyses showed that 178 susceptibility to HSV1 infection is long-lasting, with 61-63% correct classification achieved even 179 when using images from 24 hours prior to the cells encountering the virus. This is especially 180 apparent when controlling for the cell-cycle effect, with cell images taken 17 hours prior to virus 181 adsorption resulting in ~70% correct classification (Fig. 2H).

7
To quantify the life time of infection susceptibility we calculated the mixing time of the predictor 183 output (Sigal et al., 2006b). This is done by computing the auto-correlation of the predictor output 184 over time. We found that the mixing time (the time it takes for the auto-correlation to decay to 185 0.5) is around 6 hours when using the raw data and 10 hours when controlling for the cell-cycle 186 effect ( Supplementary Fig. 5). 187 Taken together, our findings suggest that the cellular state that underlies susceptibility to HSV1 To better understand the molecular mechanism that underlies the variability in infection outcome 192 among individual cells, we looked for proteins whose concentration significantly differ between 193 cells that will become infected and those that will not at time zero. Of the ~400 proteins screened, 194 we identified two such proteins -Geminin and RFX7 (Fig. 3A). On average, Geminin 195 concertation was 40% lower in cells that will become infected and RFX7 concentration was 37% 196 lower (Fig. 3B). The difference in their concentration was mainly observed at time zero, and 197 disappeared later in the infection (Fig. 3C,D). immune genes such as MHC class II (Fontes et al., 1997). We found that both Geminin and RFX7 202 show a similar cell-cycle related concentration profile ( Fig 3E). Both are rapidly degraded 203 following mitosis, with their concentration rising slowly towards the next mitosis (Fig. 3F). 204 The lower concentrations of Geminin and RFX7 in cells that will become infected is an 205 independent indication that cells in the earlier part of the cell-cycle are more susceptible to HSV1 206 infection, in agreement with the results obtained from the machine learning approach described 207

above. 208
To further experimentally test the effect of the cell cycle on infection outcome we used the double 209 thymidine block protocol, which synchronizes cells to the G1/S checkpoint (Bootsma et al., 210 1964). We infected cells either 15 minutes after releasing from the block (Fig. 3G) or 8 hours 211 after release, where the majority of cells are in the G2/M stages (Fig. 3H). We found that cells 8 infected during the G1 and early S phases were 2-3 fold more susceptible to HSV1 infection than 213 cells infected as the G2 and M phases, in two multiplicities of infection (Fig. 3I,J).  We conclude that HSV1 infection kinetics is affected by the cell-cycle stage of the host cell at the 236 time of viral adsorption. 237

HSV1 infection causes a sharp decline in SUMO2 and RPAP3 concentrations 238
Having considered the effect of the cellular state on infection outcome, we next turned to study 239 the effect of viral infection on the host cell. To our surprise, the majority of the ~400 host proteins 240 studied did not show significant differences in concentration between infected and non-infected 241 cells (Fig. 3A). 242 However two proteins, SUMO2 and RPAP3, showed reduced protein concentrations following 243 adsorption in cells that eventually became infected (Fig. 3A). SUMO2 is a ubiquitin homolog that 244 can be covalently attached to cellular proteins. Indeed, a decrease in SUMO2 levels upon HSV1 245 infection has been previously reported (Sahin et

HSV1 infection causes re-distribution of SLTM and YTHDC1 263
One unique feature of live cell microscopy is the ability to observe changes in the localization of 264 tagged proteins. We studied these changes by looking at the nuclear/cytoplasm ratio of the 265 proteins, and on their coefficient of variance (CV) in the nucleus and cytoplasm (which indicate 266 how dispersed is the protein in these compartments). We did not observe any nucleus/cytoplasm 267 trafficking nor changes in cytoplasmic proteins CV. We found that the CV of two nuclear 268 proteins (SLTM and YTHDC1) increased specifically in successfully infected cells ( The re-distribution of YTHDC1 and SLTM could be a result of their recruitment to viral 274

replication centers. To test this we fixed infected cells six hours after infection and stained them 275
for either ICP4 (an immediate early protein that is required for HSV1 gene expression) or ICP8 276 (an early protein required for HSV1 genomic replication) and found that the nuclear foci of 277 SLTM and YTHDC1 do not co-localize with them (Fig. 6F). In fact, in agreement with the fast 278 kinetics of the appearance of these foci, we occasionally observed cells that contained such foci 279 but were negative for ICP8, suggesting that the re-distribution happens before viral DNA 280 replication and is mediated by one of the immediate-early proteins of the virus. 281 As ICP0 is known to interact with many of the host proteins, we tested whether it is also involved 282 in the re-distribution of SLTM and YTHDC1. Indeed, cells successfully infected with the mutant 283 virus that does not express ICP0 did not show re-distribution of both proteins (Fig. 6G,H and 284

Supplementary Movies 8,9). 285
Our results suggest that the re-distribution of SLTM and YTHDC1 into nuclear foci is an active, 286 virus-induced process, which is facilitated by the immediate-early protein ICP0. cell's texture, morphology and cell-cycle stage at the time of adsorption enabled a supervised 292 machine learning algorithm to predict which of the cells will become successfully infected during 293 the next 9 hours. This variability in susceptibility among single cells is present in the population 294 prior to meeting the virus. We find that the cellular state that makes cells susceptible to infection 295 is composed of a fast, cell-cycle dependent component and a more stable, cell-cycle independent 296 component. We conclude that HSV1 infection outcome is not an intrinsically stochastic event. 297 Rather, it seems that individual cells have specific prior tendencies to become successfully-298 infected, showing that cellular heterogeneity of the host can have a profound impact on its 299 survival. 300 We found that cell velocity is correlated with the probability of successful infection. Cell  We find that the distribution of primary infection kinetics is well-described by a Gamma 331 distribution. The Gamma distribution is defined by two parameters -rate (β) and shape (α). We 332 find that the rate parameter varied by up to 22% between cell-cycle stages and when changing the 333 effective moi, reflecting a scaling of the infection kinetics by these features. The shape parameter, 334 in contrast, remained almost constant at a value of α=6 (changing by less than 5%). A Gamma 335 distribution may arise as a result of a sequence of rate-limiting exponential processes. When this 12 is the case, the shape parameter is the number of processes and the rate parameter is their rate 337 Looking at the individual proteins in our screen, we find two proteins whose concentration at time 346 zero is indicative of infection outcome. The concentration of one of these proteins, Geminin, is 347 well-known to be cell-cycle regulated. In fact, the Geminin protein is part of the widely-used 348 The ICP0 mutant express mTurq2 is based was constructed using the HSV-1 dl1403 (Stow and 432 Stow, 1986), an HSV-1 strain 17 with a 2kb deletion in both copies of the ICP0 gene (a kind gift 433 from Roger Everett, University of Glasgow Centre for Virus Research). A viral construct 434 originating from the mTurq2 expressing virus described above was crossed with the HSV-1 435 dl1403 and viral progeny where purified to obtain an ICP0 mutant express mTurq2. The progeny 436 virus was plaque purified and tested by phenotype and by PCR to contain both the mTurq2 gene 437 and the ICP0 deletion. Cells were allowed to grow for 24 hours. The following day, medium was replaced to an imaging 451 medium (transparent RPMI without phenol red and riboflavin from Biological Industries, Israel, 452 supplemented with penicillin, streptomycin and 5% fetal bovine serum) approximately one hour 453 before infection. Medium was then aspirated and 300 µl of imaging medium containing HSV1 at 454 an moi of 0.5 was added. Virus was allowed to adsorb to the cells for 30 minutes at 37C. During 455 this time, the imaging set-up was performed -calibrating the microscopes, choosing four fields of 456 view for each well and setting the acquisition times for the fluorescent channels. After 30 minutes 457 the virus-containing medium was aspirated and 2 ml of imaging medium added to each well. 458 Plates were placed in a temperature, CO 2 and humidity control chambers in the microscopes, 459 focus adjusted and imaging started. Imaging was done using two inverted epi-fluorescent Leica 460 microscopes (DMIRE2 and DMI6000b), controled by macro scripts developed in house. 461 (normalizing for all the cells in a specific clone). We also calculated the change in these features 472 between two consecutive frames. 473

Image and data analysis 462
The CFP concentration was calculated as the median value of CFP in the cell nucleus. A 474 threshold was calculated for each clone, based on the median level of CFP in all cells of that 475 clone in the first five frames of the movie (less than two hours post HSV1 adsorption). 476 To ensure correct tracking of the cells we employed several filters, eliminating trajectories of 477 cells that did not meet certain criteria. Such criteria included, for example, more than one mitosis 478 event in 12 hours and a rapid, non-physiological, change in the mCherry levels. Overall we 479 eliminated ~2/3 of the data, remaining with ~52,000 reliably tracked cells out of ~190,000 cells 480 imaged in total. 481

Supervised machine learning for predicting infection outcome 482
We divided our dataset of ~52,000 cells into two group -infected (CFP positive at 9 hours post 483 HSV1 adsorption) and non-infected (CFP negative for the entire 12 hours). Next we divided the 484 data into train and test sets. To avoid any biases due to differences between the clones, we made 485 sure that each clone is similarly represented in the infected and non-infected groups. We 486 additionally made sure that no particular clone will be over represented in the dataset. 75% of the 487 data was used for training the classifier and 25% (from clones not used in the training step) for 488 testing its performance. The training set included 23,780 cells and the test set 8,108. 489 We used Matlab version R2015b for all supervised machine learning procedures. We used 490 Matlab's fitensemble function to construct decision trees for classification using the RobustBoost 491 algorithm. We performed feature selection by identifying the 20 features with highest predictive 492 power using the predictorImportance function. The final classifier included 2,000 decision trees 493 based on the top 20 features. 494

Extracting cell-cycle data from still images 495
We employed a supervised machine learning approach, similar to that used by others (Kafri et al.,496 2013; Gut et al., 2015;Blasi et al., 2016), which infers the cell-cycle position of a cell from a still 497 image using a random forest regression predictor. The performance of this predictor is shown in 498 Supplementary Fig. 3. We trained and tested the predictor using independent datasets of non-499 infected cells that divided during the movies, so that we could determine the time after mitosis for 500 each cell in each frame. We aligned the cells trajectories to an imaginary cell-cycle length of 24 501 hours. This gave the best results, but using other cell-cycle lengths did not significantly alter our 502 findings. We selected the top 30 features to use in the predictor. 503

Infection kinetics modeling 504
We fitted the distribution of infection lag times with a three-parameter mixture model for the 505 primary and secondary infections ( Supplementary Fig. 2). The model assumes that the lag time 506 between adsorption and infection can be captured by a two-parameter distribution, and that this 507 distribution also captures the lag between primary and secondary infections. Specifically, the 508

number of secondary infection at a given time-point depends on the number of infections in all 509
previous time-points with appropriate delays that are given by the two-parameter distribution. 510 The relative number of secondary infections is also fitted as a third parameter. Overall we fitted 511 three parameters -two for the distribution ( , ) and one for the relative number of secondary 512

infections ( ). 513
We fitted the following two parameter distributions: Normal, Log-Normal, Weibull and Gamma. 514 For each distribution we scanned each parameter with a resolution of 0.05. Each parameter was 515 scanned in the range of ±1 of the best fit value and was scanned in the range of 0-2. For each 516 set of parameters we generated a distribution from the mixture model and estimated the log-517 likelihood of the data given this distribution. 518 To statistically assess which distribution fits the data best we performed a bootstrapping 519 procedure. We generated a distribution of log-likelihoods for each fit by resampling our data 520 1,000 times with replacements. We then performed a one-sided t-test to compute the significance 521 of the difference between these distributions. The Gamma distribution fitted our data significantly 522 better than the other three (maximal p-value<10 -15 ). 523 Confidence intervals for each parameter were computed by fixing other parameters and fitting a 524 third order polynomial to the distribution of log-likelihoods around the fitted parameter. We then 525 computed the 95% confidence interval using the second derivative of this polynomial at this 526 parameter. For all the parameters assessed, confidence intervals were at least one order of 527 magnitude lower than the estimated parameter. 528

Cell-cycle synchronization 529
5X10 4 cells were plated in 12-well glass bottom plates as described above. At 5pm the medium 530 was replaced with a full medium containing 2 mM thymidine (Sigma-Aldrich, Israel). At 8am the 531 next morning cells were washed twice with PBS and normal growth medium was added. At 5pm 532 of the same day the medium was again replaced with a thymidine containing medium. At 8am the 533 next morning half of the wells were released from blocking (washed twice and given normal 534 growth medium) and half were maintained in the blocking medium. At 4pm, eight hours later, the 535 blocked cells were released. Cells were washed and infected with HSV1 at an moi of 0.25 or 0.5 536 and imaged as described above.

. Dynamic proteomics to study virus-host interactions in single cells over time. A. 549
Schematic representation of the screen. A CFP-expressing HSV1 was allowed to adsorb to clones 550 seeded in 12-well plates for 30 minutes, washed out and cells subsequently imaged every 20 551 minutes for 12 hour. Overall, more than 50,000 single cells were followed, from ~400 different 552 YFP-expressing clones B. Model for de-mixing primary and secondary infections. Shown are the 553 measured lag-times between virus adsorption and CFP expression (gray bars), the fitted model 554 The full list of features with their relative explanatory power is listed in Supplementary Table 1. 566 We trained a supervised machine learning classifier to best discriminate cells that will becomes 567 successfully infected from those that will not and tested its performance on a separate test set. B. 568 The top 20 ranking image-analysis features that were used for predicting infection outcome.      . Shown are the YFP channel, CFP channel and a merged image including the phase channel at 0-9 hours post wild-type HSV1adsorption. D. Cells infected by wild-type HSV1 were fixed and stained for ICP4 or ICP8 at six hours post adsorption and imaged using a X100 magnification lens. Shown are representative images of nuclear foci formed by SLTM (top two rows) or YTHDC1 (bottom two rows), which do not co-localize with ICP4 or ICP8. Cellular DNA was stained with DAPI (blue) E,F. SLTM (E) and YTHDC1 (F) nuclear cv (mean±s.e.m) in non-infected (red) and infected (blue) cells following wild-type HSV1 adsorption. G,H. nuclear cv (mean±s.e.m) of SLTM (G) and YTHDC1 (H) in noninfected (red) and infected (blue) cells following infection by a mutant HSV1 that does not express ICP0. To asses the cell-cycle stage of the cells at the time of infection we trained a random forest predictor, using a dataset of noninfected cells that divided during the time-lapse movies. We used the top 30 predictive features and trained an ensemble of 500 decision trees. The figure shows the performance of the predictor on an independent test set. The predictor calculates the time from last mitosis with an rmse of 3.85. the Pearson correlation coefficient was 0.83.