The Nucleoid Binding Protein H-NS Biases Genome-Wide Transposon Insertion Landscapes

ABSTRACT Transposon insertion sequencing (TIS; also known as TnSeq) is a potent approach commonly used to comprehensively define the genetic loci that contribute to bacterial fitness in diverse environments. A key presumption underlying analyses of TIS datasets is that loci with a low frequency of transposon insertions contribute to fitness. However, it is not known whether factors such as nucleoid binding proteins can alter the frequency of transposon insertion and thus whether TIS output may systematically reflect factors that are independent of the role of the loci in fitness. Here, we investigated whether the histone-like nucleoid structuring (H-NS) protein, which preferentially associates with AT-rich sequences, modulates the frequency of Mariner transposon insertion in the Vibrio cholerae genome, using comparative analysis of TIS results from wild-type (wt) and Δhns V. cholerae strains. These analyses were overlaid on gene classification based on GC content as well as on extant genome-wide identification of H-NS binding loci. Our analyses revealed a significant dearth of insertions within AT-rich loci in wt V. cholerae that was not apparent in the Δhns insertion library. Additionally, we observed a striking correlation between genetic loci that are overrepresented in the Δhns insertion library relative to their insertion frequency in wt V. cholerae and loci previously found to physically interact with H-NS. Collectively, our findings reveal that factors other than genetic fitness can systematically modulate the frequency of transposon insertions in TIS studies and add a cautionary note to interpretation of TIS data, particularly for AT-rich sequences.

T ransposon insertion sequencing (TIS) is a powerful tool purported to enable the unbiased and comprehensive identification of genetic loci required for bacterial fitness (1,2). TIS employs deep sequencing of transposon insertion sites within a complex population of insertion mutants in order to determine the frequency with which genetic loci are disrupted. Subsequent genome-wide statistical analysis of relative insertion frequency enables identification of loci underrepresented for transposon insertion (3). Transposon insertion typically confers a loss-offunction phenotype; consequently, loci with a low frequency of transposon insertion are often presumed to contribute to bacterial fitness under the conditions assayed. TIS has enabled efficient identification of genes variously termed "essential," "domain essential," or "underrepresented" that are thought to promote bac-terial growth in a variety of bacterial species (1,2). Reliable gene classification has been bolstered by extensive optimization of TIS analysis methodologies, which have all but eliminated the technical artifacts that bias the detection of insertion frequency (3). However, biological factors that may alter the frequency of insertion, irrespective of the contributions of the loci to bacterial fitness, remain largely unexplored.
Many TIS studies have employed Mariner-based transposons, which insert exclusively at TA dinucleotides without additional sequence site restrictions (3,4). For example, several such screens were performed to identify loci that promote in vitro growth of Vibrio cholerae, the cause of the diarrheal disease cholera, as well as to identify loci required by the pathogen for intestinal colonization of infant rabbits, a model host (5)(6)(7)(8). Approximately 16% of the 3,751 nonredundant protein-coding V. cholerae genes were reported to be required for optimal in vitro growth (7). However, these genes included several well-characterized virulence loci that are dispensable for in vitro fitness, including the cholera toxinencoding genes ctxAB and the tcp operon that enables production of a colonization-linked pilus. A similar phenomenon was observed in Mariner-based TIS analysis of V. parahaemolyticus, wherein components of a virulence-linked type III secretion system (T3SS2) dispensable for in vitro growth were underrepresented for transposon insertion (9). These findings suggest that the frequency of insertion at a locus may reflect factors in addition to the fitness of corresponding insertion mutants; however, the nature of such factors has not been defined. Notably, the underrepresented virulence-associated loci were acquired via horizontal gene transfer, and, like many horizontally acquired sequences, they have GC content markedly lower than that of the ancestral chromosomes into which they have integrated (10,11).
In many bacterial species, horizontally acquired sequences are bound by the histone-like nucleoid structuring (H-NS) protein (12), which preferentially associates with AT-rich sequences through recognition of structural features of the minor groove of the DNA helix (13). H-NS can oligomerize along, or form crossbridges between, AT-rich regions, thereby producing filamentous nucleoprotein complexes and stabilizing DNA hairpins (14). H-NS binding is associated with transcriptional silencing, due both to reduced access of transcription machinery to H-NSbound promoters and to Rho-dependent transcriptional termination resulting from increased transcriptional pausing in bound regions (13). H-NS-mediated repression is overcome by the activity of transcription factors and other DNA binding proteins that compete with H-NS for target sites (12). For example, in V. cholerae, the virulence regulators ToxR and ToxT antagonize H-NS binding at virulence-related loci, including tcpA and ctxA (15)(16)(17)(18). H-NS contributes to the regulation of diverse processes, and mutants lacking H-NS typically display attenuated growth, which in some cases has been specifically linked to overexpression of horizontally acquired sequences (19). H-NS's recognition and silencing of horizontally acquired sequences are thought to offset the fitness cost of acquiring genes that are not governed by endogenous regulatory networks.
The known association between H-NS and many horizontally acquired sequences prompted us to consider the possibility that H-NS binding might account for the dearth of insertion mutants recovered for these loci. Previous studies have linked H-NS to transposition; for example, H-NS promotes the activity of Tn10 through interactions with transpososome proteins (20). Additionally, H-NS is thought to modify IS903 target site selection, although this conclusion is based on an analysis of relatively few insertion sites conducted prior to the availability of highthroughput sequencing (21). Despite these precedents and the fact that Mariner family transposons are widely used in TIS studies, the potential impacts of H-NS or similar nucleoid binding proteins on Mariner insertion have not been explored.
Here, we performed Mariner-based transposon mutagenesis of a V. cholerae H-NS mutant and implemented multiple statistical analyses in order to identify H-NS-dependent changes in transposon insertion profiles relative to a wild-type V. cholerae transposon library. We observed a striking correlation between genetic loci that display an H-NS-dependent reduction in transposon insertion and those previously found to interact with H-NS. H-NS-bound sequences with an H-NS-dependent low frequency of insertions are largely AT-rich loci, and they include numerous horizontally acquired elements and virulence-linked genes. Our findings suggest that H-NS binding biases the frequency of transposon insertion at certain loci and thus add a cautionary note to interpretation of TIS data.
A significant association between H-NS binding and low recovery of insertion mutants is likely to manifest as a relatively low frequency of insertion within AT-rich regions of the genome. To explore this possibility, we used the EL-ARTIST pipeline (5) to perform hidden Markov model (HMM)-based analyses of a V. cholerae TIS data set characterized in an earlier study (22) and classified genes based on transposon insertion frequency as well as GC content. HMM-based gene classification identified genetic loci underrepresented for transposon insertion. Notably, these genes, which correspond well to the "growth-promoting" loci identified in a similar analysis by Chao et al. in 2013 (7), are disproportionately AT rich (defined as GC content of Ͻ40%) ( Fig. 1A; Fisher's exact test, P ϭ 3.9 ϫ 10 Ϫ14 ). Similarly, analysis of a V. parahaemolyticus insertion library (9) revealed that genes with a relatively low frequency of insertion are overrepresented among AT-rich loci (see Fig. S1 in the supplemental material; Fisher's exact test, P ϭ 3.6 ϫ 10 Ϫ11 ). Collectively, these associations between AT-rich DNA and a low frequency of transposon insertion are consistent with a possible linkage between H-NS binding and a bias in transposon insertion.
To investigate if H-NS contributes to the low frequency of insertion in AT-rich genetic loci, we performed TIS on a highdensity transposon insertion library (70% of TA sites disrupted) generated in a ⌬hns derivative of V. cholerae. As described above, the EL-ARTIST pipeline (5) was used to analyze the sequence data and identify genetic loci underrepresented for transposon insertion (relative to the genome overall) in the ⌬hns library; gene classification as a function of AT content was also assessed. In striking contrast to the wild-type (wt) library, the ⌬hns library did not exhibit a correlation between low frequency of insertion and AT content ( Fig. 1B; chi-square test, P ϭ 0.30). In the ⌬hns library, genes underrepresented for transposon insertion were proportionately distributed between AT-rich genes and those with higher GC content. The absence of an AT content-correlated skew in the distribution of insertions in the ⌬hns library is consistent with the idea that H-NS accounts for the bias against insertion in low-GCcontent genes observed in the distribution of transposon insertions in the wt strain.
To compare the insertion profiles of the wt and ⌬hns libraries at the level of individual genes, we used the Con-ARTIST pipeline (5), which controls for stochastic factors in order to quantify differences between the transposon insertion profiles of two TIS data sets ( Fig. 1D; see also Table S1 in the supplemental material). This analysis identified 84 genes that were overrepresented for transposon insertion in the ⌬hns library (this study) relative to the wild-type C6706 library (22) (fold change, Ͼ2, P ϭ Ͻ0.001; see Table S2) as well as 20 genes that were underrepresented among ⌬hns insertion mutants (fold change, Ͻ0.5, P ϭ Ͻ0.001; see Table S3). In contrast to the composition of the V. cholerae genome overall, in which Ͻ10% of genes are AT rich, overrepresented genes had a bimodal distribution of GC content, with 38 genes (45%) that were AT rich (Fig. 1C). These data reflect the fact that overrepresented genes in the ⌬hns library are far more prevalent among AT-rich genes than in the remainder of the genome (12% versus 1%; Fig. 2A) (Fisher's exact test, P ϭ 5.4 ϫ 10 Ϫ25 ). Thus, direct comparison of the two transposon insertion libraries is consistent with the idea that the frequency of insertion mutations within AT-rich loci is modulated by the presence of H-NS. However, H-NS did not markedly affect mutation frequency within the portion of the genome with GC content of Ն40%.
To further explore the relationship between overrepresented genes and H-NS binding, we overlaid our TIS data on data from H-NS binding sites in the V. cholerae genome previously identified via chromatin immunoprecipitation (ChIP) sequencing (23). All 38 AT-rich overrepresented loci had detectable H-NS binding (see Table S2 in the supplemental material). In contrast, none of the 46 overrepresented genes with neutral GC content were bound by H-NS (see Table S2). H-NS binding was also not detected for the 20 underrepresented genes, all of which have GC content (44% to 53%) comparable to that of the V. cholerae genome (47.5%). We propose that H-NS-dependent changes in transposon insertion at the regions with neutral GC content reveal synthetic fitness phenotypes, consistent with the traditional interpretation of TIS data whereby differential insertion frequencies indicate altered fitness. Thus, not all H-NS-dependent changes in transposon insertion are the result of H-NS binding; however, the negative correlation between H-NS binding and transposon insertion (see Table S2) strongly suggests that interactions between H-NS and its targets can markedly reduce the frequency with which insertional disruption of these sites is observed.
The genes overrepresented in the ⌬hns library and directly bound by HNS include a variety of horizontally acquired virulence loci. For example, they include ctxA and ctxB, which encode the A and B subunits of cholera toxin and are 2 of 3 genes in the horizontally acquired CTX prophage with GC content below 40% (38.5% and 32.5%, respectively) (Fig. 2B). Also, 17 of 24 AT-rich genes encoded within the horizontally acquired vibrio pathogenicity island (VPI) were overrepresented in the ⌬hns library, including 9 associated with biosynthesis of toxin-coregulated pilus (TCP), a pilus critical for V. cholerae intestinal colonization  ( Fig. 2B). Previous studies have demonstrated that disruption of these loci, which are not expressed by V. cholerae under typical laboratory conditions at least in part due to H-NS-mediated repression (18), does not affect in vitro growth. Overrepresented loci also included components of the rfb operon, which enables O-antigen synthesis and was likely acquired through lateral gene transfer (Fig. 2B). As observed for the previously described virulence loci, rfb genes are not required for optimal growth of V. cholerae in vitro, but, unlike other virulencelinked loci, rfb genes display robust expression in vitro both in wt cells (where H-NS is bound) and in a ⌬hns mutant (24). Overall, these observations provide further support for the idea that H-NS can modulate the output of TIS-based studies and may particularly influence results for virulence genes, since these loci are often bound by H-NS. There are several mechanisms by which H-NS may limit isolation of insertion mutants for the genetic loci to which it binds. H-NS could structurally occlude access of the Himar1 transpososome to its TA dinucleotide targets, either by polymerizing along the surface of AT-rich DNA and creating a protein barrier or by inducing formation of folded DNA structures that restrict transposition. There is precedence for the idea that other DNA binding proteins, such as ParB, can antagonize transposition of other transposons, such as Mu (25). Alternatively, H-NS might interfere with expression of the transposonintegral selectable marker when it has been inserted into transcriptionally repressed regions. The low frequency of insertion within the rfb operon, despite the constitutive and H-NSindependent expression of this region (24), suggests that transcriptional silencing is unlikely to account for the H-NSdependent bias in insertion at this locus. It is unclear whether observations regarding the rfb operon are generalizable, particularly since H-NS is not typically bound to actively transcribed genes (19). Ultimately, it is difficult to exclude the possibility that elevated transcription of loci ordinarily silenced by H-NS contributes to the observed increases in insertions in such loci in the absence of H-NS. However, since there is not a positive correlation between gene expression levels and the frequency of transposition (see Fig. S2A in the supplemental material), relief of the silencing of H-NS bound genes is unlikely to account for the changes in Mariner insertion profiles that we observed in the absence of this nucleoid binding protein.
It is noteworthy that H-NS reportedly binds to many loci that do not display H-NS-dependent changes in transposon insertion. The 38 overrepresented AT-rich loci are only a small portion of the 332 loci bound by H-NS in V. cholerae; thus, H-NS binding does not inevitably impair recovery of associated insertion mutations. However, the genes overrepresented in the ⌬hns strain generally had a higher degree of H-NS binding than genes that did not exhibit increased Mariner insertion (see Fig. S2B in the supplemental material). Besides the density of bound protein, the effects of H-NS on TIS output may be modulated by several factors, including the structure and stability of the DNA-H-NS complex, the prevalence of proteins that compete with H-NS for target sites, and/or the proximity of H-NS binding to Mariner target sites. Further studies are warranted to precisely define the molecular factors that govern the interplay between H-NS binding and Mariner transposon insertion.
Regardless of the mechanism(s) that modulates Mariner insertion into H-NS bound DNA, our findings reveal that H-NS bind-ing can skew TIS-based assessment of AT-rich genes and, consequently, that caution in interpretation of TIS results for this subset of genes may be particularly warranted. It is possible that other nucleoid binding proteins, such as Fis, HU, or Rok, or potentially site-specific DNA binding proteins may also bias the distribution of transposon insertions and thus complicate interpretation of TIS data. While it might be possible to mitigate the bias in Mariner insertion caused by H-NS, e.g., by carrying out in vitro transposition in species that can be transformed at high frequency, it is unlikely that it will be possible to eliminate all biases in transposon insertion. Thus, elucidation of the biological processes that modify transposon site selection will enhance our capacity to interpret TIS experiments.
Construction of V. cholerae hns deletion mutant. The ⌬hns strain was constructed by homologous recombination, using a derivative of the negatively selectable suicide vector pCVD442 as previously described (22). The targeting vector contained 500 bp of DNAs flanking each side of hns, which were cloned into pCVD442's SmaI site using isothermal assembly. Mutant selection was performed as previously described (5). The list of primer sequences is in Table S4 in the supplemental material.
Transposon insertion sequencing. A Himar1 transposon library in ⌬hns V. cholerae was created and sequenced as previously described (22), except that conjugation was extended to 10 h. Sequenced reads were mapped onto a V. cholerae reference genome (N16961), and all TA sites were tallied and assigned to annotated genes as previously described (5). The library con-tained~600,000 colonies with 134,064 unique transposon insertions, representing 70% of all TA dinucleotides from 5,300,801 mapped reads. The Con-ARTIST pipeline (5) was used to compare the ⌬hns library with a previously characterized wild-type library (22) in order to identify overrepresented and underrepresented genes.
Accession number. The raw sequencing reads of the three hns knockout libraries were deposited in Sequence Read Archive (SRA) in NCBI under accession numbers SRP081158, SRP081163, and SRP081165.