Functional Multigenomic Screening of Human-Associated Bacteria for NF-κB-Inducing Bioactive Effectors

Human-associated bacteria are thought to encode bioactive small molecules and proteins that play an intimate role in human health and disease. Here, we report on the creation and functional screening of a multigenomic library constructed using genomic DNA from 116 bacteria found at diverse sites across the human body. Individual clones were screened for genes capable of conferring NF-κB-inducing activity to Escherichia coli. NF-κB is a useful reporter for a range of cellular processes related to immunity, pathogenesis, and inflammation. Compared to the screening of metagenomic libraries, the ability to normalize input DNA ratios when constructing a multigenomic library should facilitate the more efficient examination of commensal bacteria for diverse bioactivities. Multigenomic screening takes advantage of the growing available resources in culturing and sequencing the human microbiota and generates starting points for more in-depth studies on the mechanisms by which commensal bacteria interact with their human host.

key advantage of this approach is that all hits are directly associated with a cloned fragment of DNA, thus permitting the identification of not only bioactive metabolites and proteins but also the genes that encode these molecules. When used to study the human microbiome, metagenomics is limited by the fact that, among potential body sites of interest, only stool generally contains sufficient bacterial biomass to permit the facile extraction and cloning of metagenomic DNA into large insert libraries. Another potential limitation of constructing a library directly from human-derived samples is that bacteria are not evenly distributed in nature, and therefore effectors from organisms that make up only a small fraction of the microbiome can be difficult to sample using this approach (6). In contrast to results in many other environments, extensive efforts to culture bacteria from the human microbiome have been quite successful. In fact, recent analyses suggest that as much as 60% to 70% of the human-associated bacteria can now be cultured, providing a means to readily access the genetic material from the majority of the microbiota (7)(8)(9)(10)(11). Here, we bring forward the concept of using a genomic library created from a defined collection of cultured bacteria (i.e., a multigenomic library) as a way of identifying microbiota-encoded effectors. This approach allows both the normalization of input DNA and the use of DNA from bacteria found in body sites from which sufficient biomass is difficult to obtain. The resulting library is expected to contain a more balanced representation of source bacteria and therefore a more manageable resource for use in identifying effector genes.
In this study, we use a 13,300-member multigenomic cosmid library to identify effectors encoded by a collection of 116 human-associated bacterial strains found at diverse sites across the human body. Cosmid clones created using a normalized mixture of genomic DNA isolated from these strains and hosted in Escherichia coli were screened for the production of effectors that activate nuclear factor-B (NF-B) signaling in human cells. We selected NF-B activation as a target phenotype because of the key role this pathway plays in diverse biological systems that are expected to be associated with the human microbiome, for example, immune homeostasis, pathogenesis, and inflammation (2,4). Downstream effects of NF-B signaling range from proinflammatory responses to protective functions (12), making NF-B a potentially useful reporter for a broad range of human-associated bacterial effectors. The 21 NF-B active cosmid clones identified represent 17 unique genomic regions from 16 different input organisms. The proteins encoded by specific effector genes identified from both commensal and pathogenic species include domains of unknown function, membrane transporters, cell wall hydrolases, and lipopolysaccharide (LPS) core biosynthetic genes. A multigenomic approach provides a simplified function-first method for exploring human microbiome genetic information, one that takes advantage of growing available resources in culturing and sequencing members of the human microbiome, which should facilitate the discovery of novel bacterial effectors from human-associated bacteria.

RESULTS
Construction and analysis of a multigenomic library from 116 humanassociated bacteria. A multigenomic cosmid library was constructed from genomic DNA of 116 bacterial strains found in the Human Microbiome Project (HMP) catalog ( Fig. 1A and Table S1 in the supplemental material). In total, the strains used for library construction represented 87 unique species across the four main phyla commonly observed in the human microbiome. This collection included isolates that appear at different body sites and with diverse relative abundances in the microbiota of healthy individuals (13). We did not deliberately bias our selection of strains based on any previous reports of biological activity. Based on an estimated average insert size of 30 kb per cosmid, the library comprises ϳ400 Mb of multigenomic DNA, which corresponds to approximately 1-fold coverage of the 116 input genomes.
To determine the abundance of each input genome in the library, all library clones were individually cultured in LB medium, and cosmid DNA was isolated from an equal volume pooling of these cultures. The collective pool of cosmids was sequenced using Illumina MiSeq 300-bp paired-end read technology, and individual sequencing reads were aligned to the genomes of the strains used to create the multigenomic library. Percent abundance based on the number of mapped reads was calculated for each strain. As expected for a normalized library containing 116 different strains, the relative abundances of the input genomes center just below 1%, with the abundance of most genomes falling within 1 order of magnitude above or below this expected value (Fig. 1B). The relative abundances of 46 of the strains included in our multigenomic library have been mapped across multiple body sites in hundreds of metagenomic samples from healthy humans (13). In this analysis, these same strains appear at relative abundances spanning almost 8 orders of magnitude (Fig. 1C), making it difficult to  (13). Red circles represent strains from which bacterial effectors were identified in our screening. access the genomes of many of these strains in a metagenomic library constructed from any of the hundreds of samples sequenced in this study.
Screening of the multigenomic library for NF-B inducing activity. A schematic of the high-throughput protocol we used to screen the multigenomic library for NF-B-inducing activities is depicted in Fig. 2A. Briefly, filter-sterilized culture broth from individually grown clones was transferred onto HEK293 cells transformed with green fluorescent protein (GFP) under the control of the NF-B consensus transcriptional response element (HEK293:NF-B:GFP). CFU counts indicated that Escherichia coli cultures remained viable throughout the duration of the assay, indicating that extensive host cell lysis was not likely to interfere with our analysis of NF-B activity (Fig. S1A). The normalized percentage of live GFP-positive cells in each well was assessed by fluorescence microscopy (Fig. 2B). Primary hits (GFP-positive cell Z-score of Ͼ3) were reassayed to remove false positives, resulting in 21 validated hits. Hits induced GFP expression in 20% to 90% of cells in a well (Fig. 2C). The average baseline activation produced by E. coli supernatant was 14%, while tumor necrosis factor alpha (TNF-␣), a potent known inducer of NF-B activation (14), induced GFP expression in 80% of cells. Assessment of propidium iodide (PI)-stained cells in all screened wells showed that less than 1% of all clones increased the ratio of PI-stained cells to total cells by more than 3 standard deviations (Z-score of Ͼ3) compared to levels in the empty vector control wells (Fig. S1B), indicating that most clones did not cause a general increase in reporter cell death. Moreover, no validated hit caused a dramatic change in the ratio of PI-stained (dead) cells to total cells (Fig. S1C) or in the cellular and nuclear morphology of treated HEK293:NF-B:GFP cells (Fig. S1D).
As an initial step in the characterization of the validated active clones, we sought to determine whether the bioactive mediators they produced comprised high-molecularweight components (e.g., proteins) or small molecules. For this experiment, sterile spent culture broth generated from each hit was passed through a 10-kDa molecularweight-cutoff (MWCO) membrane, and the flowthrough in each case was assayed for NF-B inducing activity. In all cases, the 10-kDa MWCO flowthrough continued to induce GFP expression, suggesting that low-molecular-weight products (Ͻ10 kDa) were responsible for the observed activities (Fig. S1E). No correlation was observed between the level of extracellular protein in sterile spent culture and NF-B activity ( Fig. S1F), suggesting that the observed NF-B activities were not artifacts resulting from different levels of heterologous protein expression by some clones.
Sequencing of NF-B-inducing clones and identification of bacterial effector genes. Cosmid DNA was extracted from each hit and Sanger sequenced using primers that flank the pJWC1 cloning site. The resulting sequences were then aligned to the reference genomes of the 116 strains comprising the multigenomic library to identify the genomic fragment in each cosmid clone. Of the 21 hits, we identified 17 unique genomic regions from 16 input strains ( Fig. S1G and Table S2). This corresponds to a hit rate of one unique effector for every 782 clones screened (Fig. 2C), which represents an approximately 4-fold improvement compared to rates with previous metagenomic library screening efforts (2). The genomic regions encoding molecules that activated the NF-B reporter arise from a distribution of major taxa that closely resembles that used in the construction of the multigenomic library (Fig. 2D). In fact, genomic fragments identified as hits arise from representatives of all major phyla included in the normalized library. As seen in other studies (2, 15), E. coli was able to express genes from a taxonomically diverse group of bacteria making it an appealing host for functional metagenomic screening studies. Effectors arise from bacteria found at most body sites represented in the library. Bacteria from the skin comprised the only group that failed to generate any effector genes. The small number of skin species examined in this initial study will need to be expanded before we can determine if this is a stochastic sampling effect or if the microbial community found in the human skin does indeed differ from communities of other body sites in regard to the production of NF-B-inducing effectors.
Seven unique clones with diverse levels of potency were subjected to in vitro transposon mutagenesis to identify the specific genes responsible for conferring on E. coli the ability to induce the NF-B:GFP reporter. Cosmids randomly mutagenized by in vitro Tn5 transposon mutagenesis were transformed into E. coli, and the resulting transformants were assayed for NF-B-inducing activity. Cosmids recovered from E. coli strains that no longer induced GFP expression were sequenced. Genes found to contain transposons were identified, and their wild-type versions were subcloned into an inducible expression vector and tested for the ability to confer NF-B-inducing activity on E. coli. Genes or sets of genes that when subcloned were found to confer on E. coli the ability to induce NF-B were termed "microbiome bacteria effector genes," or here, Mbegs (Table S3). The Mbegs we identified are predicted to encode proteins that fall into four functional categories: (i) proteins of unknown function (1 clone, 1 unique genomic region); (ii) cell wall hydrolases (4 clones, 3 unique genomic regions); (iii) membrane transporters (2 clones, 2 unique genomic regions); and (iv) LPS core biosynthesis (3 clones, 2 unique genomic regions).
The protein of unknown function contains a conserved domain (DUF2974, or PF11187) belonging to the alpha/beta hydrolase superfamily (16) and is encoded by an effector gene (Mbeg1) identified in a clone from Gemella sanguinis M325 (Fig. 3). G. sanguinis is a rare but emerging opportunistic pathogen that was originally isolated from blood (17) and is associated with rare cases of endocarditis (18,19). The species has also been identified in healthy proximal intestine samples from humans (20).
In three clones, the identified effector genes encode cell wall hydrolases (Mbeg2 from Gemella morbillorum M424, Mbeg3 from Enterococcus faecium TX0133a04, and Mbeg4 from Mobiluncus mulieris 28-1) (Fig. 3). Mbeg2 contains two conserved hydrolase domains: one lysozyme-like domain, and one Nlpc/P60 family domain. The Nlpc/ P60 peptidase domain is also present in the protein encoded by Mbeg3, whereas Mbeg4 contains a peptidoglycan recognition domain with an amidase catalytic site, followed by three repeats of a cell wall binding motif. Related hydrolases were identified in a previous NF-B reporter screening from a human stool metagenomic library (2).
Two different transporter systems were identified as effectors responsible for NF-B-inducing activity (Fig. 3). Mbeg5 from Neisseria mucosa C102 was identified as a four-gene cluster corresponding to the conserved capsule transport operon (ctr), which encodes the machinery for exporting capsule polysaccharide (CPS) in encapsulated meningococcal groups. CPS serves several functions in Gram-negative bacteria, including immune evasion by human pathogens (21). Mbeg6, a single open reading frame (ORF) from Enterococcus faecium TX1330, is predicted to encode major facilitator superfamily 1 (MFS-1) protein. MFS-1 transporters are known to function in bacterial adaptation to the natural host environment, exporting antimicrobial agents and virulence factors involved in colonization and infection (22). The specific molecules that are exported as a result of heterologous expression of these systems in E. coli remain to be characterized.
LPS operon characterization. Two clones containing overlapping genomic regions from Citrobacter portucalensis 30_2 were among the most potent hits we observed. An operon of six genes predicted to encode lipopolysaccharide core biosynthetic proteins was identified by transposon mutagenesis as being associated with the NF-B-inducing activity (Fig. 4A). Interestingly, a homologous operon was present in a third clone containing a genomic fragment from C. portucalensis 4_7_47_CFAA. This clone was associated with the highest level of activity we observed among our hits (Fig. 2C, clone 21). In addition to the LPS biosynthesis cluster, this clone contains a SpoT/RelA homolog predicted to encode a (p)ppGpp synthetase. The (p)ppGpp alarmone is a global regulator of gene expression in bacteria (23). SpoT/RelA homologs from Bacteroides were the most common NF-B-inducing effector genes identified from human stool metagenomes (2). The presence of a second predicted effector on the same clone may explain the increased potency of the clone from C. portucalensis 4_7_47_CFAA.
The potent bioactivity presented by all three clones containing the LPS core biosynthetic operon prompted a more detailed investigation on this operon. We initially subcloned individual genes as well as subsets of genes from the six-gene LPS operon identified by transposon mutagenesis and found that three genes were necessary and sufficient for conferring NF-B-inducing phenotype. The collection of these three genes was termed Mbeg7 (Fig. 4A, open reading frames d, e, and f). The activity produced by Mbeg7 passed through a 3-kDa MWCO filter and remained active after heat treatment (15 min at 95°C), suggesting that it was the result of a heat-stable small molecule (Fig. 4B). The three genes that make up this Mbeg are predicted to encode two LPS heptosyltransferase domain-containing proteins and one ligase, which is predicted to attach the O-antigen chain to an LPS core. While bacterial LPS is well documented as an important pathogen-associated molecular pattern and prominent virulence factor of many Gram-negative pathogens (24), finding an LPS-related effector in our functional screening was unexpected, given that HEK293 cells are deficient in Toll-like receptors (TLRs), the main mediators of host responses to bacterial LPS (2,25). Consistent with this reasoning, we tested the activity of LPS extracted from E. coli and confirmed that the reporter is unresponsive to this molecule (Fig. S2).
To further characterize the phenotype induced by E. coli transformed with Mbeg7 in human cells in vitro, we compared transcriptomes from HEK293:NF-B:GFP cells treated Functional Multigenomics of Human Bacteria ® with the bioactivity produced by E. coli transformed with Mbeg7 to those treated with E. coli containing an empty vector. To reduce background noise from components in the culture medium, the bioactivity produced by cells transformed with Mbeg7 was enriched from culture broth using XAD resin extraction and reversed-phase chromatography (Fig. S3A). The empty vector culture broth control was treated in the same manner. In cells treated with the Mbeg7-associated bioactive product, we detected 36 genes that were upregulated by 5-fold or greater (P Ͻ 0.05) (Fig. 4C). These genes included multiple mediators of inflammation, including the following: cytokines, chemokines, and ligands with chemotactic activity; interleukin (IL) receptors and regulators (CCL20, CXCL8, TNF, CXCL10, IL-32, CCL2, CXCL1, CXCL2, and CXCL3); tumor necrosis factor (TNF) receptors (TNFRSF9 and TNFRSF12A); TNF-induced regulators such as PTX3, CALCB, and TNFAIP3 (involved in the termination of TNF-or LPS-induced NF-B activation); adhesion molecules implicated in inflammation (VCAM1, ICAM1, TNC, SERPINB8, SH2D3C, and CD44); elements of NF-B signaling (RELB and NFKBIA); regulators of inflammation (ATF3, IER3, ELF3, BIRC3, PLA2G4C, HSPB8, and DUSP8). Accordingly, the gene ontology (GO) enrichment analysis identified an enrichment in a collection of GO terms related to inflammation and cellular responses to molecules of bacterial origin (Fig. S3B). Taken together, these data indicate that Mbeg7 confers on E. coli the production of a metabolite that induces an NF-B-driven inflammation response in an in vitro model otherwise unresponsive to canonical LPS signaling.
Mbeg7 and LPS biosynthetic diversity in the human microbiome. A blastn search of bacterial reference genomes in NCBI using the three genes that make up Mbeg7 identified 18 genomes from nine different Enterobacteriaceae species that contain this three-gene cassette ( Fig. 5A and Table S4). Not all sequenced strains from these species contain the Mbeg7 genes; however, Mbeg7-containing strains from six of these species have been isolated from human samples. To better understand how Mbeg7 fits into LPS biosynthetic diversity in the human microbiome, we performed a comparative analysis of the Enterobacteriaceae LPS core biosynthesis locus (rfa, or waa) across the 18 genomes (9 species) harboring Mbeg7-like sequences, as well as 98 Enterobacteriaceae genomes available from the Human Microbiome Project database. The rfa (waa) loci from these genomes show a diversity in gene content and organization (Fig. 5B). The most conserved feature of these loci is the presence of three genes predicted to encode heptosyltransferases that are involved in the biosynthesis of the LPS inner core (rfaC, rfaF, and rfaQ encoding HepI, HepII, and HepIII, respectively) (Fig. 5C). In a phylogenetic tree of rfa (waa) locus heptosyltransferases, these gene families fall into distinct monophyletic clades (Fig. 5D). The additional heptosyltransferases encoded by Mbeg7 are phylogenetically distinct from the common heptosyltransferases (Fig. 5D). A similar result is found in a multiple-sequence alignment of O-antigen ligases encoded by Mbeg7 (Fig. S4). These phylogenetic differences suggest that enzymes encoded by Mbeg7 have different substrate specificities than other LPS biosynthesis enzymes and are therefore likely to be involved in the production of a unique LPS core structure.

DISCUSSION
Functional screening of the multigenomic library that we generated from 116 distinct human associated-bacteria identified 17 unique genomic regions that confer on E. coli the ability to induce NF-B signaling. Based on the redundancy rate we observed (four redundant genomic regions in a total of 21 hits), we expect that future studies would benefit from using libraries that exceed 1-fold coverage of the input DNA. Ultimately, the optimal size for a multigenomic library will depend on a number of features including, among others, the number of input genomes, the size of the input genomes, and the accuracy in normalizing the input DNA.
The specific effector genes (i.e., Mbegs) identified from the seven bioactive clones we analyzed in detail fell into diverse functional categories. These Mbegs represent starting points for more in-depth studies on the mechanisms by which humanassociated bacteria interact with the host. Although it is possible that, in some cases, Functional Multigenomics of Human Bacteria ® NF-B induction may arise from stress responses generated by E. coli, previous studies suggest that this is not generally the case for this assay (2). For example, the cell wall hydrolases, found in three hits from this study as well as those identified in previous metagenomic screening efforts (2), likely represent a common strategy by which bacteria generate metabolites that are recognized by the human host. Peptidoglycan is known to interact with host signaling in diverse ways. For example, human cells can sense bacterial cell wall components and their breakdown products via patternrecognition receptors (e.g., NOD1 and NOD2) that lead to activation of NF-B-mediated transcription (26,27). Modification of peptidoglycans is also a common mechanism employed by bacteria to interfere with host signaling (28). A bacterial peptidoglycan hydrolase of the NlpC/P60 family has also been shown to play a role in commensally induced protection against enteric infections by promoting host epithelial barrier function (29,30). While cell wall hydrolases have been previously associated with host-microbiota interactions, the DUF2974 domain-containing protein of unknown function that is encoded by Mbeg1 is the first example of this protein family being associated with a potential effector function in the human microbiome.
While the specific LPS-related structure encoded by bacteria containing the Mbeg7 effector has not yet been determined, it is notable that, in the context of humanassociated bacteria, specific modifications of LPS structure can have major effects in the microbially induced modulation of host processes such as autoimmunity (31), proliferation and differentiation of colonic epithelial cells (32), and host tolerance of gut microbes (33). Structural variation in the oligosaccharide core region of LPS has also been implicated in pathogenicity (34,35). Studies on LPS signaling and its role in pathogenesis have largely focused on either the lipid A fragment or the highly variable O-antigen chain. The role of LPS core oligosaccharide variability has been less extensively studied (36,37). The consequences of additional heptosyltransferase genes in the LPS core biosynthetic locus is not known in the context of bacterial pathogenicity and host-microbe interactions; however, there is mounting evidence that bacterial heptose derivatives and biosynthetic intermediates act as important signaling molecules in microbial recognition and host immune responses, particularly via TLR-independent pathways (38)(39)(40)(41)(42). Although it remains to be seen how exactly LPS biosynthesis differs in strains harboring Mbeg7, we suspect that modified LPS-related structures may induce a strong proinflammatory response in human cells and potentially represent a distinct mechanism by which this select group of organisms interacts with the human host.
Within the multigenomic library, the genomes that yielded the 21 validated hits appeared in a relatively narrow range of relative abundances (Fig. 1C, red circles). In metagenomic samples from across diverse body sites of healthy humans, these same genomes appear at relative abundances spanning 7 orders of magnitude. The large fraction of low-frequency species that is seen in the human microbiome could only be sampled from a natural metagenome using an impractically large library construction and screening campaign. For instance, Enterococcus faecium TX1330 makes up on average 0.00033% of the bacteria in healthy human stool (13). The strain was present in our library at a relative abundance of 0.33%. In order to have the same chance of identifying the E. faecium-derived bioactivity we found in the multigenomic library (Mbeg6), a stool metagenomic library would need to be 1,000-fold larger (i.e., 13 million clones). The compact nature of a multigenomic library should afford the more facile exploration of diverse heterologous hosts and reporter assays, and we expect that these factors will increase the rate at which novel bacterial effectors are discovered from the human microbiome.

MATERIALS AND METHODS
Genomic DNA, library construction, and sequencing. Purified genomic DNA from 116 humanassociated bacterial strains was purchased from the Biodefense and Emerging Infections Research Resources Repository (BEI Resources) (see Table S1 in the supplemental material). All samples were analyzed by electrophoresis (0.7% agarose) and were verified to contain chromosomal DNA fragments of Ն20 kb. A pool was generated by combining 0.25 g of each genomic DNA sample, according to the Estrela et al. ® concentration provided by the manufacturer. DNA from the pooled sample was precipitated using isopropanol and resuspended in 40 l of nuclease-free water (Millipore Sigma, Burlington, MA) at a final DNA concentration of 125 g/ml. The multigenomic DNA pool was blunt ended (End-It DNA End-Repair kit; Lucigen, Middleton, WI) and ligated (Fast-Link DNA Ligation kit; Lucigen) into ScaI-digested pJWC1 cosmid vector (43). Cosmids were packaged into lambda phage (MaxPlax Lambda Packaging Extracts; Lucigen) and transfected into Escherichia coli EC100 cells in the presence of 10 mM MgSO 4 . Transformants were selected by tetracycline resistance and SacB sucrose sensitivity on LB agar plates containing tetracycline (15 g/ml) and sucrose (10%). A total of 13,300 individual clones were robotically arrayed into 50 384-well plates and stored at -80°C as glycerol stocks. Cultures of individual clones grown overnight at 37°C in 160 l of LB medium with 15 g/ml tetracycline were combined, and cosmids were purified from cell pellet (total ϳ1 g of wet weight) using a NucleoBond Xtra Midi kit (Macherey-Nagel, Düren, Germany) according to the manufacturer's instructions. The sample was sequenced using an Illumina MiSeq instrument, and the 300-bp paired-end reads were processed by the seqtk trimfq toolkit using the default settings (https://github.com/lh3/seqtk). Quality-trimmed reads that aligned to pJWC1 sequence or to E. coli K-12 substrain MG1655 were removed from analysis to account for vector-derived sequences and host genomic DNA contamination, respectively. The remaining reads were aligned to 116 genomes comprising the multigenomic library using Bowtie2 (44). Because of the high similarity between genomes at the strain level, in cases when multiple strains of a given species were included in the library, reads were averaged among all input strains of each species. For nine of the strains, the genome was not published at the time of this analysis, and therefore the representative genome of the species was used for mapping.
Reporter cell line and culture conditions. The NF-B reporter cell line (2) consisted of HEK293-TN cells stably transfected with the pGreenFire lentiviral NF-B GFP-luciferase plasmid (Systems Biosciences, Palo Alto, CA), with a minimal cytomegalovirus (CMV) promoter and four copies of the NF-B transcriptional response element controlling GFP expression. HEK293 cells do not express Toll-like receptor 2 (TLR2) and TLR4, thereby reducing baseline activation by E. coli host membrane components. Cells were routinely cultured in 75-cm 2 flasks at 37°C and 5% CO 2 in Dulbecco's modified Eagle's medium (DMEM) supplemented with L-glutamine (200 mM), penicillin (100 U/ml), streptomycin (100 U/ml), 10% fetal bovine serum (FBS), and phenol red (15.9 mg/liter). Within 18 h prior to each assay, cells were trypsinized, suspended in phenol red-free DMEM supplemented as above, and counted using trypan blue staining and a Countess automated cell counter (Invitrogen). Cells were seeded on cell culture-grade clearbottom, black 384-well plates (Corning, New York, NY) at 2,500 cells/well in 25 l of phenol red-free DMEM.
NF-B bioactivity screening, hit validation, and initial characterization. Individual multigenomic library clones arrayed into 384-well plate wells were cultivated for 4 days at 30°C in 160 l LB medium with 15 g/ml tetracycline. E. coli cell viability under the growth conditions we used for screening was determined based on counts of CFU that appeared on LB agar plates containing 15 g/ml tetracycline. The cultures were pelleted by centrifugation at 4,000 ϫ g for 30 min at 4°C, and the supernatant was filtered through a 0.2-m-pore-size membrane (Pall, Port Washington, NY) to generate sterile spent culture broth. Using an automated liquid-handling system (Tecan EVO, Tecan, Männedorf, Switzerland), 20 l was transferred onto cells seeded as described above. After 24 h of incubation, 10 l of a phosphate-buffered saline (PBS) solution containing nuclear staining (2 g/ml Hoechst 33342) and dead cell marker (6 g/ml propidium iodide) was added to each well using a Multidrop instrument (Thermo Fisher Scientific, Waltham, MA). Images were taken with a 10ϫ objective using an ImageXpress XLS Widefield High Content Microscope (Molecular Devices, San Jose, CA) with the fluorescent filters Texas red (excitation, 562 nm [562 ex ]; emission, 624 nm [624 em ]) for propidium iodide (PI), fluorescein isothiocyanate (FITC; 482 ex and 536 em ) for GFP, and 4=,6=-diamidino-2-phenylindole (DAPI; 377 ex and 447 em ) for Hoechst 33342 and analyzed using an automated custom module in MetaXpress software (Molecular Devices). Cell death was assessed as the ratio of PI-stained cells to total Hoechst-stained nuclei in each well. NF-B activation was measured as the ratio of live GFP-expressing cells to total live cells in each well. Results were normalized to a set of negative-control wells in each plate (E. coli host transformed with empty pJWC1 vector) and expressed as Z-scores. Clones with a Z-score greater than 3 were assayed a second time in eight replicates under the same protocol. Hits were considered validated when they showed a Z-score greater than 2.5 in at least six out of the eight wells. Representative images of cells treated with supernatant from validated hits were taken with a 100ϫ objective to assess reporter cell morphology. Supernatants were also analyzed for extracellular protein content using a Qubit fluorescent protein assay kit (Thermo Fischer Scientific) according to the manufacturer's instructions. For molecular-weight-cutoff experiments, validated hits were inoculated in 1 ml of LB medium with 15 g/ml tetracycline in deep-well 96-well plates, and sterile spent culture broth was generated as described above. Samples were processed through a 10-kDa MWCO Amicon centrifugal filter unit (Millipore Sigma) according to the manufacturer's instructions, and the flowthrough was assayed for NF-B activity as described above.
Sequence annotation of inserts from bioactive clones and identification of effector genes by transposon mutagenesis. Cosmid DNA was obtained from each bioactive clone using a Monarch Plasmid Miniprep kit (NEB, Ipswich, MA) and sequenced by Sanger sequencing using primers targeting the flanking regions of the ScaI site on pJWC1 vector (Table S5). Each pair of forward and reverse end sequences was submitted to a blastn search against all 116 genomes comprising the library. Insert sequences were retrieved from the corresponding source genome by determining the start and end positions from end-sequence alignment and extracting GenBank and fasta sequences between these coordinates. Extracted sequences had ORFs predicted using Metagenemark (45) and searched against the NCBI nonredundant (nr) database using blastx. Cosmid DNA from selected bioactive clones was mutagenized using the EZ-Tn5 ϽKAN-2Ͼ insertion kit (Lucigen) according to the manufacturer's instructions, desalted using agarose gel tubes, and transformed into electrocompetent E. coli EC100 cells. Mutants were selected on LB agar using kanamycin (50 g/ml) and tetracycline (15 g/ml), and single colonies were inoculated and assayed in quadruplicate in the NF-B reporter system as described above, with the unmutagenized clone as a positive control. Knockout mutants were identified, and the locations of transposon insertions were determined by Sanger sequencing using transposon-specific sequencing primers (Lucigen).
Subcloning and inducible expression of effector genes. Primers with overhanging NdeI and XhoI sequences (Table S5) were designed to PCR amplify the nucleotide sequence of effector genes of interest from the pJWC1 insert template cosmid with Q5 High-Fidelity DNA polymerase (NEB) according to the manufacturer's protocol. The annealing temperature was calculated for each primer pair using the online NEB Tm Calculator (version 1.9.13), and extension time was set according to the expected amplicon length (30 s/kb). PCR products were gel purified and digested with NdeI and XhoI (NEB), ligated into NdeI-and XhoI-digested and dephosphorylated pET28c vector using T4 DNA ligase (NEB), and transformed into electrocompetent T7 express E. coli. Single colonies were cultured in LB medium with 50 g/ml kanamycin, and plasmid DNA was extracted using a Monarch Plasmid Miniprep kit (NEB). Identity of the cosmid inserts was confirmed by Sanger sequencing using T7 and T7-term specific primers (Table S5). The subclones were cultured in LB medium in the presence of kanamycin (50 g/ml) and, upon reaching an optical density at 580 nm (OD 580 ) of 0.6, induced with isopropyl-␤-D-thiogalactopyranoside (IPTG; 500 M) for 20 h at 18°C. Sterile spent broth collected after this period was tested for activity using the NF-B assay as described above.
Extraction and bioactivity-guided fractionation. A glycerol freezer stock of E. coli T7 Express cells with pET28c-Mbeg7 was inoculated into 50 ml of LB medium with 50 g/ml kanamycin. The culture was incubated overnight at 37°C and 200 rpm and diluted 1:100 into 1 liter of LB medium with kanamycin. The subculture was incubated at 37°C and 200 rpm until an OD 600 of Ϸ0.6 and induced with 0.5 mM IPTG from a 0.5 M IPTG stock solution. The culture was incubated at 18°C and 200 rpm for 20 h. The cells were pelleted by centrifugation (4,200 ϫ g for 30 min at 4°C). The culture supernatant was filtered through 0.2-m-pore-size bottle-top filters. Amberlite XAD16N resin (Millipore Sigma) (20 g) was added to 1 liter of supernatant in medium bottles and shaken at 80 rpm for 30 min. The resin was filtered using a bottle-top filter and washed with 1 liter of double-distilled H 2 O (ddH 2 O), and 150 ml of 50% MeOH was used to elute the extracted molecules. The eluent was dried in vacuo, resuspended with a small volume of 50% MeOH, and loaded onto 1 g of C 18 silica gel (Sorbent Technologies, Norcross, GA). Flash chromatography using a Teledyne ISCO CombiFlash Rf 200 instrument equipped with a 100-g highperformance (HP) C 18 RediSep Rf Gold column was employed for crude separation. Solvent A was water, and solvent B was acetonitrile. The method was a 0 to 60% B gradient over 30 min with a flow rate of 60 ml/min. Fractions were collected and assayed for bioactivity, and the fractions containing the highest activity were pooled and dried for further purification (Fig. S3A). Next, an isocratic high-performance liquid chromatography (HPLC) method using 20% acetonitrile in water on a Waters XBridge BEH phenyl column was designed to enrich bioactivity. The flow rate was 2.5 ml/min. Fractions were collected every 1 min, dried by evaporation, dissolved in PBS, and assayed for bioactivity. The fraction with the highest activity was dried by lyophilization. As a control, a culture of E. coli T7 Express cells transformed with pET28c empty vector was processed in parallel using the same extraction and fractionation protocol.
Total mRNA sequencing and transcription analysis. HEK293:NF-B:GFP cells were grown at 37°C in 5% CO 2 in DMEM supplemented with L-glutamine (200 mM), penicillin (100 U/ml), streptomycin (100 U/ml), 10% fetal bovine serum (FBS), and phenol red (15.9 mg/liter). Upon reaching ϳ80% confluence on in 75-cm 2 flasks, cells were trypsinized, stained with trypan blue, and counted using a Countess automated cell counter (Invitrogen, Carlsbad, CA); cells were then seeded on six-well clear tissue culture-grade plates (Corning) at 3 ϫ 10 5 cells/well in 4 ml of culture medium and grown for 18 h. Cells were treated with the bioactive fraction from cultures of E. coli transformed with Mbeg7 (fraction obtained as described above) or the equivalent fraction from an E. coli empty vector control (40 l of solution in PBS, for final concentration of 10 g/ml) and incubated for 6 h. Each condition was assayed in three separate wells in two independent experiments. Total RNA was extracted from cells in each well using a Quick-RNA kit (Zymo Research, Irvine, CA) according to the manufacturer's protocol. Highthroughput RNA sequencing (RNA-seq) libraries were generated from 100 ng of total RNA using an Illumina TruSeq Stranded mRNA LT kit. Libraries prepared with unique barcodes were pooled at equal molar ratios. The pool was denatured and sequenced on an Illumina NextSeq 500 sequencer using high-output V2 reagents and NextSeq Control Software, version 1.4, to generate 75-bp single reads, according to the manufacturer's protocol. Genes differentially expressed in the two groups were identified with the DESeq2 package (46), using as a cutoff an adjusted P value of less than 0.05 and a fold change higher than 5. Differentially expressed genes (DEGs) underwent gene set enrichment tests with the topGO package (47).
Bioinformatic analyses of effector genes. Translated amino acid sequences of each effector gene were submitted to a conserved domain (CD) search using the CD Search tool on NCBI. Additionally, a hidden Markov model (HMM) scan (48) was performed to identify Pfam domains present in each effector. The sequences of LPS core biosynthesis-related genes of interest were submitted to blastn search against the NCBI ref_seq database for Bacteria. For family-wide analysis of the LPS core biosynthesis locus (rfa or waa) in human-associated Enterobacteriaceae, the genomes of the 18 strains found to harbor Mbeg7 homologs (Table S4), plus 98 annotated reference genomes from Enterobacteriaceae strains in the HMP database (www.hmpdacc.org/hmp/catalog), were surveyed. The library host, E. coli strain K-12 substrain Estrela et al.
MG1655, was also included in this analysis. The complete list of accession numbers for these strains is presented in Table S6. The nucleotide sequences found between kbl and rpmBG genes (49) were extracted as fasta files and submitted to a custom annotation pipeline using Metagenemark to identify individual ORFs and to an HMM scan to annotate genes according to the presence of Pfam domains corresponding to heptosyltransferases (PF01075.15), glucosyltransferases (PF00535.24, PF01501.18, PF00534.18, and PF13439.4), epimerase (PF01370. 19), O-antigen ligase (PF04932.13), Kdo kinase (PF06293.12), HepII kinase (PF06176.9), and Kdo transferase (PF04413.14). After the genomes were grouped according to genetic organization of the rfa locus, a subset of 47 genomes comprising one representative genome from each unique species in each of the groups was used for phylogenetic analysis of the heptosyltransferase sequences (total of 197 sequences) ( Table S6). The analysis was performed on the Phylogeny.fr platform (50), starting with multiple sequence alignment using MUSCLE (v3.8.31) with default settings for highest accuracy (51). The alignment was curated to remove ambiguous regions using Gblocks (version 0.91b) (52) with the following parameters: minimum length of a block after gap cleaning, 10; no gap positions allowed in the final alignment; rejection of all segments with contiguous nonconserved positions bigger than 8; minimum number of sequences for a flank position, 85%. A maximum likelihood phylogenetic tree was reconstructed in the PhyML program (version 3.1/3.0, approximate likelihood-ratio test [aLRT]) (53). The Whelan and Goldman (WAG) substitution model was selected assuming an estimated proportion of invariant sites (of 0.018) and four gamma-distributed rate categories to account for rate heterogeneity across sites. The gamma shape parameter was estimated directly from the data (gamma ϭ 1.262). Reliability for the internal branch was assessed using the aLRT test (Shimodaira-Hasegawa [SH]-like) (54). Graphical representation and editing of the tree was performed using the online tool interactive Tree of Life (iTOL) (55).