Determining the Specificity of Cascade Binding, Interference, and Priming in vivo

ABSTRACT


SIGNIFICANCE 21 22
Many bacterial and archaeal species encode CRISPR-Cas immunity systems that protect against 23 invasion by foreign DNA. In the Escherichia coli CRISPR-Cas system, a protein complex, 24 Cascade, binds 61 nt CRISPR RNAs (crRNAs). The Cascade-crRNA complex is directed to 25 invading DNA molecules through base-pairing between the crRNA and target DNA. This leads 26 to recruitment of the Cas3 nuclease that destroys the invading DNA molecule, and promotes 27 acquisition of new immunity elements. We show that Cascade-crRNA binding to DNA is highly 28 promiscuous in vivo. Consequently, endogenous E. coli crRNAs direct Cascade binding to >100 29 chromosomal locations. In contrast, target degradation and acquisition of new immunity 30 elements requires highly specific association of Cascade-crRNA with DNA, limiting CRISPR-31 Cas function to the intended targets. 32 33 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

INTRODUCTION
cells constitutively expressing all other cas genes, and each of two crRNAs that target either the 126 lacZ promoter or the araB promoter (both targets are chromosomal; Figure S2A-B). ChIP-seq 127 data for Cse1 and Cas5 were highly correlated (R 2 values of 0.93-0.99 for lacZ-targeting cells, 128 and 0.99 for araB-targeting cells), consistent with Cse1 and Cas5 always binding DNA together 129 in the context of Cascade. We detected association of Cascade with many genomic loci for each 130 of the two spacers tested ( Figure 1A+B; Table S1). In all cases, the genomic region with 131 strongest Cascade association was the on-target site at lacZ or araB. Off-target binding events 132 occurred with <20% of the ChIP signal of on-target binding. To determine the sequence 133 requirements for off-target Cascade binding with each of the two crRNAs used, we searched for 134 enriched sequence motifs in the Cascade-bound regions, excluding the on-target site (Table S2). 135 For both the lacZ and araB spacers, the most enriched sequence motif we identified was a close 136 match to an AAG PAM, followed by 5 nt of sequence complementarity at the start of the seed 137 region ( Figure 1C-D; c.f. Figure S2A-B). In some cases, we observed Cascade binding events 138 associated with non-AAG PAMs; however, these sites were more weakly bound, and/or had 139 matches in seed region beyond position 5. We conclude that as few as 5 bp in the seed region, 140 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint together with an AAG PAM, are sufficient for Cascade binding, with additional base-pairing in 141 the seed region increasing binding and/or overcoming the need for an AAG PAM. 142 143 Extensive off-target Cascade binding driven by endogenous spacers 144 We identified several sites of Cascade binding that were shared between cells targeting lacZ and 145 cells targeting araB. These bound regions were not associated with sequences matching the seed 146 regions of either crRNA. We reasoned that these off-target binding events may be due to 147 Cascade association with the endogenous E. coli crRNAs. To test this hypothesis, we performed 148 ChIP-seq of Cse1-FLAG 3 , as described above, for cells expressing only the endogenous CRISPR 149 RNAs from their native loci. Thus, we identified 188 binding sites for Cascade ( Figure 2A; 150 Table S1). These sites were associated with four enriched sequence motifs, with each motif 151 corresponding to an AAG PAM and 5-10 nt matching the seed region of a crRNA from the 152 CRISPR-I array (spacers #1, #3, #4, and #8; Figure 2B; Figure S2C; Table S2). The strongest 153 binding events were associated with spacer #8 of CRISPR-I ( Figure 2B; Figure S2C). To 154 confirm that Cascade binding events were due to association with endogenous crRNAs, we 155 repeated the ChIP-seq experiment in cells lacking the CRISPR-I array and cells lacking the 156 CRISPR-II array. Deletion of CRISPR-II had little effect on the profile of Cascade binding 157 ( Figure 2C; Table S1). In contrast, deletion of CRISPR-I resulted in loss of Cascade binding to 158 almost all sites bound in wild-type cells ( Figure 2D; Table S1). Instead, low-level binding of 159 Cascade was observed at a small number of sites that were associated with a weakly enriched 160 sequence motif corresponding to a perfect PAM and 8 nt matching the seed region of spacer #2 161 of CRISPR-II (Figure S2D + S3; Table S2). 162 163 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint expressing endogenous crRNAs. 165 Our data suggested that the majority of Cascade binding associated with endogenous crRNAs is 166 due to CRISPR-I, and that the dominant spacer from CRISPR-I is spacer #8 ("sp8"). To confirm 167 this, we measured Cascade binding by ChIP-seq in cells lacking CRISPR-I but expressing a 168 plasmid-encoded sp8 crRNA. Most of the Cascade binding sites we observed were identical to 169 those seen in cells expressing both CRISPR arrays, or cells expressing only CRISPR-I ( Figure  170 3A; Table S1), and corresponded to regions containing strong matches to sp8 (orange dots in 171 Figure 3A correspond to regions containing a match to the sp8 motif shown in Figure 2B). As 172 expected, and unlike for cells expressing CRISPR-I, we detected only a single strongly enriched 173 sequence motif ( Figure S4A; Table S2). This motif, as expected, corresponds to an AAG PAM 174 and 9 nt matching the seed region of sp8 ( Figure S2C). We also detected a weakly enriched 175 sequence motif ( Figure S4B; Table S2) that corresponds to an AAG PAM and the 11 nt 176 immediately downstream of the second repeat on the plasmid encoding the sp8 crRNA. This is 177 likely due to formation of a non-canonical crRNA that consists of the sequence between the 178 second repeat and the transcription terminator ( Figure S2E). A transcription terminator hairpin 179 has previously been shown to function analogously to repeat sequence in the E. coli crRNAs 180 (37). 181

182
The most enriched Cascade target region in cells with CRISPR-I, and cells expressing sp8 183 crRNA, was inside the yggX gene. We identified a sequence in this region with an AAG PAM 184 and matches to positions 1-5 and 7-10 of sp8 ( Figure 3B). We used targeted ChIP-qPCR to 185 measure Cascade binding to this site in cells lacking CRISPR-I but expressing plasmid-encoded 186 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint sp8. We compared binding of Cascade to yggX in wild-type cells, and cells where the putative 187 protospacer was mutated in the region predicted to bind the sp8 crRNA seed. As expected, we 188 observed greatly reduced Cascade binding at the mutated site relative to the wild-type site. 189 Similarly, we observed greatly reduced Cascade binding at the wild-type site when we expressed 190 a mutant sp8 with changes in the seed region ( Figure 3C). However, when we combined the 191 mutant spacer with the mutant protospacer, base-pairing potential was restored, and we observed 192 wild-type levels of Cascade binding ( Figure 3C). We conclude that sp8 is the major determinant 193 for off-target Cascade

Off-target Cascade binding is not associated with interference 207
Previous studies have suggested that extensive mismatches at the 3 end of the 208 spacer/protospacer prevent interference (12, 18). To determine whether off-target Cascade 209 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint binding events lead to interference, we constructed a Δ yggX Δ cas3 strain expressing all other cas 210 genes, with both CRISPR arrays intact. We introduced a plasmid with the off-target protospacer 211 from yggX that is an imperfect match to sp8, or an equivalent plasmid with a protospacer that is a 212 perfect match to sp8. We transformed each of these strains with a plasmid expressing cas3, or an 213 equivalent empty vector, simultaneously selecting for retention of the protospacer-containing 214 plasmid. We reasoned that the number of viable transformants with the cas3-containing plasmid 215 would be low for cells where interference caused loss of the protospacer-containing plasmid, 216 since these cells would be killed by the antibiotic selection. In contrast, the number of viable 217 transformants with the empty vector should be high in all cases. Thus, we measured the relative 218 level of interference for each of the two protospacers. As expected, the protospacer that perfectly 219 matches sp8 resulted in highly efficient interference, whereas the protospacer with the native 220 yggX sequence (i.e. imperfect match to sp8) resulted in no detectable interference ( Figure 4A). 221 We conclude that off-target Cascade binding events do not cause interference. 222 223

Off-target Cascade binding is not associated with priming 224
The molecular determinants for priming have not been well studied. However, protospacers with 225 multiple mismatches to a crRNA can still result in priming (24), and a recent study suggested 226 that binding of Cascade to a protospacer with extensive mismatches, including in the seed, is 227 sufficient to cause priming (12). To test whether off-target Cascade binding is sufficient for 228 priming, we used the strains described above that contained a plasmid with a protospacer that is 229 either an imperfect or a perfect match to sp8. We then introduced a plasmid with an inducible 230 copy of cas3, under non-inducing conditions, to avoid interference. Following induction of cas3 231 expression, we harvested cells and PCR-amplified the 5 end of the CRISPR-II array to 232 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint determine whether new spacers had been acquired because of priming. We observed robust 233 primed spacer acquisition for the protospacer with a perfect match to sp8, but no detectable 234 spacer acquisition for the off-target protospacer with an imperfect match to sp8 ( Figure 4B). We 235 conclude that off-target Cascade binding events do not cause priming. association than did the optimal protospacer (variant i) ( Figure 5A). We presume that the level of 252 ChIP signal for the protospacer with the CCG PAM (variant ii) represents the background of this 253 experiment. The protospacer with a sub-optimal, ATT PAM (iii), showed reduced Cascade 254 binding relative to the optimal protospacer (variant i), but was well above the experimental 255 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint background ( Figure 5A). Similarly, mismatches in the seed region (variants iv -viii) resulted in 256 partial or complete loss of Cascade association, depending on the specific sequence mismatch 257 ( Figure

Near-complete crRNA-protospacer base-pairing is required for priming and interference 271
We next determined which of the protospacer variants lead to interference. Using a modification 272 of a previously described assay (see Methods) (24, 40), we measured the level of interference 273 with a plasmid target for each of the 13 protospacers, using Δ cas1 cells that cannot acquire new 274 spacers; primed spacer acquisition cannot contribute to the level of interference in these cells. As 275 expected, the optimal protospacer (i) was associated with robust levels of interference, whereas 276 protospacer variants that do not bind Cascade (variants ii, iv, v, xi, xii, and xiii; Figure 5A) were 277 not associated with detectable interference ( Figure 5B). Protospacers with PAM and seed 278 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint variants that showed reduced but not abolished Cascade binding (variants iii, vi, vii, and viii; 279 Figure 5A) were associated with a range of interference levels that correlate well with the level 280 of Cascade binding. However, the ability of protospacers to cause interference did not always 281 correlate with the level of Cascade association. Specifically, we detected no interference for 282 either of the protospacer variants with mismatches only at the 3 end (variants ix and x; Figure  283 5B), even though these protospacers bind Cascade at least as well as the optimal protospacer 284 ( Figure 5A). 285

286
Previous studies have proposed that some protospacers with sub-optimal PAMs or mismatches in 287 the seed region are not subject to detectable interference, but are subject to priming (12, 22, 24, 288 40). We determined whether the 13 protospacer variants caused priming in a plasmid context. 289 Specifically, we introduced an inducible copy of cas3 into cells containing each of the 290 protospacers on a high-copy plasmid. We then induced expression of cas3, and PCR-amplified 291 the CRISPR-II array to determine whether new spacers had been added. We observed robust 292 primed spacer acquisition for all protospacers associated with interference (variants i, iii, vii, and 293 viii; Figure 5C). By contrast, we observed no spacer acquisition for protospacers that do not bind 294 Cascade (variants ii, iv, v, xi, xii, and xiii; Figure 5C). Strikingly, we observed primed spacer 295 acquisition for two protospacers that were not associated with detectable interference ( Figure  296 5C). One of these protospacers (variant vi) has the seed mismatch with the lowest level of 297 Cascade binding that is above the experimental background ( Figure 5A). The other protospacer 298 has mismatches across positions 25-32 (variant ix). Thus, for these protospacers, we detected 299 robust Cascade binding and priming but we were unable to detect interference. For the 300 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint protospacer with mismatches across positions 19-32 (variant x), we detected no priming. Thus, 301 for this protospacer, we detected robust Cascade binding, but no priming or interference. 302 303 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017.  Two previous studies proposed that AAG, GAG, TAG, AGG, and ATG are optimal PAMs in E. 318 coli (24, 44), while another study suggested that AAG, ATG and GAG PAMs were associated 319 with moderately higher affinity Cascade binding than an AGG PAM (35). Our data clearly 320 indicate that AAG is the optimal PAM for off-target sites, with most off-target Cascade binding 321 events being associated with an AAG PAM. Specifically, 65% of Cascade binding sites 322 associated with a detectable motif have an AAG PAM for the crRNAs targeting lacZ and araB, 323 and the plasmid-encoded sp8 crRNA. Moreover, off-target Cascade binding events with higher 324 enrichment scores, suggestive of higher Cascade affinity, were more likely to be associated with 325 an AAG PAM than Cascade binding events with lower enrichment scores (76% vs 61% for the 326 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint top 20% and bottom 80% of bound regions, respectively, after sorting by Cse1 enrichment level). 327 We hypothesize that the dependence on the PAM for Cascade binding is increased in situations 328 where base-pairing only occurs in the seed region. According to this model, complete or near-329 complete base-pairing between the crRNA and protospacer would weaken the requirement for an 330 optimal PAM, obscuring differences in PAM affinity. This would explain why previous studies 331 suggested that there are at least three optimal PAMs (24, 35, 44). 332

333
Defining the crRNA seed 334 The seed region of a crRNA has been previously defined as positions 1-5 and 7-8, with position 335 1 being immediately adjacent to the PAM (28). However, our data suggest that the length of the 336 seed varies between crRNAs, since we observed off-target binding with some crRNAs that 337 requires base-pairing in positions 1-5, whereas off-target binding for other crRNAs requires 338 base-pairing up to position 9 (Figures 1-2, S3-S4). We propose that the crRNA sequence 339 determines the length of the seed, and that this reflects the initial binding mode, prior to extended 340 base-pair formation. Every 6 th position of the crRNA is flipped out in the Cascade-crRNA 341 complex, and hence does not contribute to base-pairing (16,45,46). Consistent with this, the 342 importance of position 6 for off-target binding is substantially less than that of positions 1-5 343 (Figures 1-2, S3-S4). Nonetheless, off-target protospacers had a sequence match to the crRNA at 344 position 6 far more frequently than expected by chance (45% for the crRNAs targeting lacZ and 345 araB, and the plasmid-encoded sp8 crRNA; Binomial Test p-value = 2.4e -10 ). We hypothesize 346 that the initial binding of Cascade to a protospacer includes base-pairing interactions at position 347 6, but that the complex rapidly transitions to a conformation in which the 6 th position is flipped 348 out of the helix. Our data are consistent with an in vitro study of another Type I-E system, where 349 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017.

Evidence that interference and priming are obligately coupled processes 370
Priming was initially proposed to be an alternative pathway to interference, with optimal 371 PAM/seed sequences leading to interference, and sub-optimal sequences leading to priming (12, 372 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint 17, 22, 24, 40, 49). However, primed spacer acquisition has been observed in situations where 373 interference occurs, suggesting that priming and interference can be coupled processes ( Figure 5, 374 variants i, iii, vii, and viii) (23, 31-33). While these data show that priming and interference can 375 occur at the same time at a population level, they do not necessarily indicate that individual 376 priming and interference events are coupled. Moreover, while it has been proposed that 377 interference and priming are obligately coupled (50), this has not been tested, and there are many 378 examples where primed spacer acquisition has been observed in the absence of detectable 379 interference (12, 22, 24, 31, 40, 49). Our data show that protospacers with seed sequence 380 mismatches can cause detectable priming but not detectable interference when the protospacer is 381 present on a multi-copy plasmid ( Figure 5). Strikingly, for protospacers with seed mismatches, 382 the levels of interference and priming correlate well with the level of Cascade binding ( Figure 5). 383 We detected primed spacer acquisition but not interference for the weakest-bound seed variant 384 that has above-background levels of Cascade binding ( Figure 5, variant vi). This is consistent 385 with the expectation that primed spacer acquisition is a more sensitive readout of Cascade/Cas3 386 function since (i) it is an irreversible process, and (ii) it does not require destruction of all copies 387 of the plasmid. Our data are consistent with a model in which low levels of interference are 388 undetectable when plasmid replication outpaces plasmid degradation (50). We also observed 389 primed spacer acquisition in the absence of detectable interference for a protospacer with 390 mismatches across positions 25-32 ( Figure 5, variant ix). We propose that this degree of 391 mismatch at the 3 end of the crRNA greatly reduces, but does not abolish, the isomerization of 392 Cascade into the "active" state that recruits/activates Cas3. 393 394 Extensive, inert, off-target binding of Cascade 395 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint Cascade has many off-target binding sites due to its ability to bind DNA with low sequence-396 specificity. Consequently, the endogenous crRNAs transcribed from the bacterial genome result 397 in extensive off-target binding, even in the absence of an on-target site. Since off-target binding 398 does not involve complete R-loop formation, it has no deleterious effects on genome integrity. 399 We also observed no impact on transcription associated with any of the off-target binding events, 400 despite that fact that targeted Cascade binding is known to repress transcription by occluding 401 promoters or acting as a roadblock for elongating RNA polymerase (38, 39). Transcription 402 repression by Cascade is considerably weaker when targeting within a transcribed region (i.e. 403 acting as a roadblock) (38). Given that the location of off-target Cascade binding sites is 404 essentially random with respect to genome organization, and that genes make up ~90% of the E. important to note that self-targeting by Type I CRISPR systems has been described previously, 418 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint but these would be considered "on-target" events, likely caused by acquisition of spacers from 419 the chromosome. As expected for spacers with perfect sequence complementarity, these self-420 targeting crRNAs are typically functional in gene regulation and interference (52-54). 421 422 Not all crRNAs are created equal 423 The E. coli genome encodes at least 19 crRNAs, yet our data suggest that only four crRNAs 424 contribute to off-target binding of Cascade. All four of these crRNAs are encoded in the 425 CRISPR-I array, and the majority of off-target binding is driven by just one, sp8. The lack of off-426 target binding driven by CRISPR-II crRNAs is likely due to weak transcription of this array, 427 which is repressed by H-NS (55). In contrast, the CRISPR-I array is likely co-transcribed with 428 the upstream cas genes, which are strongly transcribed in the strain used in this study. The 429 preference for specific spacers within CRISPR-I cannot be explained by differences in 430 expression levels, since the crRNAs are transcribed as a single RNA. Rather, biases in spacer 431 usage are more likely due to differential assembly of specific crRNAs into Cascade. Consistent 432 with this, a previous study surveyed crRNAs associated with Cascade. Spacers #2, #4 and #8 433 represented 68% of the Cascade-associated crRNAs (9). The cause of this bias is unclear, but 434 may in part be due to differences in RNA secondary structure between spacers, which could 435 impact the efficiency of RNA processing by Cas6e. Consistent with this, RNA secondary 436 structure of repeat sequences, and associated processing by Cas6, has been shown to be impacted 437 by spacer sequences in the Type I-D system of Synechocystis sp. PCC 6803 (56). Nonetheless, it 438 is likely that other factors influence the level of off-target binding, since the relative association 439 of crRNAs for spacers #2, #4 and #8 with Cascade is likely to be similar (9), but sp8 drives a 440 disproportionately high level of off-target binding. 441 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Strains and plasmids 444
All strains, plasmids, oligonucleotides and purchased, chemically synthesized dsDNA fragments 445 are listed in Table S3. All strains are derivatives of MG1655 (57). CB386 has been previously 446 described (38). CB36 contains a chloramphenicol resistance cassette in place of cas3. We 447 removed this cassette using Flp recombinase, expressed from plasmid pCP20 (58), to generate 448 strain AMD536. Epitope tagged strains AMD543 and AMD554 (Cse1-FLAG 3 and FLAG 3 -Cas5, 449 respectively), were generated using the previously described FRUIT method of recombineering 450 (59). Cse1 was C-terminally tagged in AMD543 by inserting a FLAG 3 tag immediately upstream 451 of codon 495 using oligonucleotides JW6364 and JW6365. Tagging of Cse1 resulted in an 8 452 amino acid C-terminal truncation. We predicted based on phylogenetic comparisons and on 453 structural data (46) that this truncation would not impact the function of Cse1. Cas5 was N-454 terminally tagged in AMD554 by inserting FLAG 3 using oligonucleotides JW6272 and JW6273. 455 LC060 is a derivative of was generated using (i) FRUIT (59) with oligonucleotides JW7537-456 JW7540 to delete the CRISPR-II locus, (ii) P1 transduction of the CB386 (Δcas3 457 Pcse1)::(cat::P J23199 ) region, (iii) FRUIT (59) to C-terminally tag Cse1 with FLAG 3 (as described 458 above for AMD543), and (iv) pCP20-expressed Flp recombinase (58) to remove the cat cassette. 459 LC074 is a derivative of AMD536 in which the CRISPR-I array was deleted using FRUIT (59) 460 with oligonucleotides JW7529 and JW7530 and a synthesized dsDNA fragment (gBlock 461 14148263; Integrated DNA technologies). LC077 is a derivative of LC074 in which Cse1 was C-462 terminally tagged with FLAG 3 (as described above for AMD543). AMD566 is a derivative of 463 AMD536 in which Cse1 was C-terminally tagged with FLAG 3 (as described above for 464 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint AMD543). LC099 is a derivative of AMD566 in which the off-target binding site for Cascade in 465 yggX was mutated using FRUIT (59) with oligonucleotides JW7635-8. LC103 is a derivative of 466 AMD536 in which the the yggX gene was replaced with a kanamycin resistance cassette using 467 P1 transduction from the Keio Collection ΔyggX::kan R strain (60). LC106 is a derivative of 468 LC103 with an unmarked, scar-free deletion of cas1 made using FRUIT with oligonucleotides 469 JW7898-JW7901. The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint oligonucleotides (JW7913 and JW7914 for pLC021, and JW7924 and JW7925 for pLC022), and 488 cloning the resultant DNA fragments into the EcoRV and SphI sites of pBAD24. pAMD191 is a 489 derivative of pBAD33 (61) that expresses cas3 under arabinose control. To construct pAMD191, 490 cas3 was amplified by colony PCR using oligonucleotides JW7736 and JW7738. The PCR 491 product was cloned into the SacI and HindIII sites of pBAD33 using In-Fusion (Clontech). All 492 protospacers described in Figure 5 are cloned into plasmid pLC020, the "pre-protospacer 493 plasmid", which is a derivative of pBAD24 (61). pLC020 was generated by cloning the ~500 bp EcoRI site of the pBAD24 derivative using In-Fusion (Clontech) to generate plasmids pLC023-503 pLC035 (see Table S3 for details). 504 505

ChIP-qPCR 506
For all ChIP-qPCR and ChIP-seq experiments, cells were grown overnight in LB, subcultured in 507 LB supplemented with 0.2% arabinose and 100 μg/mL ampicillin (for experiments where a 508 crRNA was expressed from a plasmid) at 37 °C with aeration to an OD 600 of ~0.6. AMD566 and 509 LC099 with either pLC008 or pLC010 were used for ChIP-qPCR. ChIP-qPCR was performed as 510 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint described previously (62), except that 2 μL anti-FLAG M2 monoclonal antibody (Sigma) and 1 511 μL anti-σ 54 monoclonal antibody (NeoClone) were included simultaneously in the 512 immunoprecipitation step. qPCR was performed using oligonucleotides JW7490-1 (amplifies the 513 off-target site in yggX) and JW7922-3 (amplifies the region upstream of hypA). Since were calculated by comparing read coverage at peak centers for all peaks identified for the 531 analyzed datasets. Read coverage at peak centers was determined using a custom Python script. 532 Sequence motifs were identified using MEME (version 4.12.0) (66) with default parameters. 533 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint RNA-seq 535 RNA-seq was performed in duplicate with strains AMD536 and LC074. Cells were grown 536 overnight in LB, subcultured in LB supplemented with 0.2% arabinose at 37 °C with aeration to 537 an OD 600 of ~0.6. RNA was purified using a modified hot phenol method, as previously 538 described (67). Purified RNA was treated with 2 μL DNase (TURBO DNA-free kit; Life 539 Technologies) for 45 minutes at 37 °C, followed by phenol extraction and ethanol precipitation. 540 The RiboZero kit (Epicure) was used to remove rRNA, and strand-specific cDNA libraries were 541 created using the ScriptSeq 2.0 kit (Epicure). Sequencing was performed using an Illumina Next-542 Seq Instrument (Wadsworth Center Applied Genomic Technologies Core). Differential RNA 543 expression analysis was performed using Rockhopper (version 2.03) using default parameters 544 (68). Differences in RNA levels were considered statistically significant for genes with q-values 545 ≤ 0.01. 546 547 Plasmid transformation efficiency assay 548 LC103 was transformed with either pLC021 or pLC022. These strains were then transformed 549 with either empty pBAD33 or pAMD191 (expresses cas3), and cells were plates on M9 medium 550 supplemented with 0.2% glycerol, 0.2% arabinose, 100 μg/mL ampicillin and 30 μg/mL 551 chloramphenicol at 37 °C. After overnight growth, colonies were counted, and the ratio of 552 pAMD191-transformed cells to pBAD33-transformed cells was calculated for each of the two 553 strains. 554 555 PCR to assess primed spacer acquisition 556 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint Primed spacer acquisition was assessed for AMD536 with pAMD191 and either pLC021 or 557 pLC022 ( Figure 4B), LC103 with pAMD191 and each of pLC023-pLC035 ( Figure 5C), and 558 AMD543/AMD544 with pAMD191 and pAMD189 (expresses a self-targeting crRNA; Figure 559 S1). Cells were grown overnight in LB supplemented with 100 μg/mL ampicillin and 30 μg/mL 560 chloramphenicol at 37°C with aeration, and sub-cultured the next day in LB supplemented with 561 chloramphenicol and 0.2% arabinose at 37°C with aeration for six hours. Cells were pelleted 562 from 1 mL of culture by centrifugation, and cell pellets were frozen at -20°C. PCRs were then 563 performed on the cell pellets, amplifying the CRISPR-II array using oligonucleotides JW7818 564 and JW7819. PCR products were visualized on acrylamide gels. 565 566 Sequence analysis of protospacers from a pooled ChIP library 567 LC099 with each of the 13 protospacer variant plasmids (pLC23-pLC035), was grown overnight 568 in LB supplemented with 100 μg/mL ampicillin. 10 mL subcultures were grown in LB 569 supplemented with 100 μg/mL ampicillin and 0.2% arabinose at 37°C with aeration to an OD 600 570 of ~0.6. 3 mL from each culture was combined. ChIP was performed on mixed cultures 2 μL M2 571 anti-FLAG monoclonal antibody (Sigma), as previously described (62). A Zymo PCR Clean and 572 Concentrate kit was used to purified ChIP and input DNA. A 50 μL FailSafe (Epicentre) PCR 573 reaction using FailSafe PCR 2X PreMix "C" and 5.48 ng of ChIP DNA was performed following 574 the manufacturer's instructions, using oligonucleotide JW8567 and each of oligonucleotides 575 JW8537, JW8556, JW8557, JW8558, JW8559, JW8561, JW8562, JW8563, JW8564, and 576 JW8565 (these incorporate different Illumina indices). PCR products were purified and 577 concentrated using 0.8X Ampure Beads (Beckman Coulter Life Sciences) and sequenced on an 578 Illumina Mi-Seq Instrument (Wadsworth Center Applied Genomic Technologies Core). 579 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint Sequence reads were mapped to each of the 13 protospacer variants using a custom Pythom 580 script. Relative protospacer abundance in input and ChIP samples for each protospacer were 581 normalized to the total sequence reads. Values for normalized protospacer abundance were 582 further normalized to values from the input sample. Protospacer abundance values are reported 583 relative to those for the optimal protospacer (variant i in Figure 5). 584 585

Measuring interference for a pooled protospacer library 586
Overnight cultures of LC106 strains with each of the 13 protospacer plasmids (pLC23-pLC035) 587 were grown in LB with 100 μg/mL ampicillin and 30 μg/mL kanamycin. All 13 cultures were 588 combined to make a single subculture; 7.7 μL of each strain into a 10 mL culture. 589 Electrocompetent cells were made and transformed with either empty pBAD33 or pAMD191 590 (pBAD33-cas3). Transformants were plated onto M9 agar supplemented with 0.2% glycerol, 591 0.2% arabinose, and 30 μg/mL chloramphenicol, and grown overnight at 37°C. Cells were 592 scraped off plates, washed in LB, and protospacers were PCR amplified from cell pellets with 593 oligonucleotide JW8567 and each of oligonucleotides JW8537, JW8558, JW8559, JW8562, 594 JW8563, and JW8566 (these incorporate different Illumina indices). PCR products were purified 595 and concentrated with 0.8X Ampure Beads (Beckman Coulter Life Sciences), and sequenced 596 using a Illumina Mi-Seq Instrument (Wadsworth Center Applied Genomic Technologies Core). 597 Sequence reads were mapped to each of the 13 protospacer variants using a custom Pythom 598 script. Individual protospacer abundances were compared between Cas3-expressing cells and 599 cells containing empty pBAD33. Protospacer abundances were normalized to those for the 600 protospacer with a CCG PAM (variant ii in Figure 5). 601 602 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted July 27, 2017. ; https://doi.org/10.1101/169011 doi: bioRxiv preprint  certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.