Horizontal Gene Transfer to a Defensive Symbiont with a Reduced Genome in a Multipartite Beetle Microbiome.

Associations between microorganisms and an animal, plant, or fungal host can result in increased dependence over time. This process is due partly to the bacterium not needing to produce nutrients that the host provides, leading to loss of genes that it would need to live independently and to a consequent reduction in genome size. It is often thought that genome reduction is aided by genetic isolation—bacteria that live in monocultures in special host organs, or inside host cells, have less access to other bacterial species from which they can obtain genes. Here, we describe exposure of a genome-reduced beetle symbiont to a community of related bacteria with nonreduced genomes. We show that the symbiont has acquired genes from other bacteria despite going through genome reduction, suggesting that isolation has not yet played a major role in this case of genome reduction, with horizontal gene gains still offering a potential route for adaptation.

period of exclusive association with the host, and the symbionts that become obligate tend to carry out highly important functions for the host (9)(10)(11). For example, mitochondria and chloroplasts, organelles that are required for energy production and carbon fixation in eukaryotic and plant cells, originated from endosymbiotic capture of alphaproteobacteria and cyanobacteria ϳ1.2 billion years ago (Bya) and ϳ900 million years ago (Mya), respectively (12). The acquisition of these organelles allowed the diversification of eukaryotic species (13). More recently, aphids evolved to feed on plant sap depleted of several essential amino acids only through capture of an endosymbiont that can synthesize these nutrients, namely, Buchnera aphidicola, ϳ160 to 280 Mya (14,15). In these cases, the microbial symbiont has lost the ability to live independently of the host, and the hosts are also dependent on their symbionts.
The mechanism by which symbionts become obligate is through loss of genes required for independent (but not host-associated) life, leading to an overall reduction in genome size (9)(10)(11). This gene loss is the result of relaxed selection of genes for functions provided by the host and also of increased genetic drift as a consequence of the presence of small effective populations in strict vertical transmission (16,17). A general mutational bias toward deletion in bacteria (18) combined with many successive population bottlenecks that allow the fixation of slightly deleterious mutations (10) mediates general gene degradation and genome reduction in symbionts. Also, a low level of recombination between different strains is often thought to contribute to this process. While these mechanisms are largely thought to be nonadaptive, there are also examples of free-living bacteria with reduced genomes, where reduction is thought to be the result of strong selection for reduced metabolic expenditure (19)(20)(21). In symbionts, there is also evidence that increases in AT content and reductions in genome size could be selected for to reduce the metabolic burden on the host (22,23). The early stages of this process are manifested by a proliferation of nonfunctional pseudogenes and a decrease in coding density in the genome (11) before the intergenic sequences are lost to eventually give tiny (Ͻ1-Mbp) genomes (9,10). While there is a robust model for the evolutionary forces that drive this process once a symbiont becomes host restricted, it is unknown how free-living bacteria first transition to an endosymbiotic lifestyle and start on the road to genome reduction (10).
The beetle subfamily Lagriinae offers an opportunity to examine this issue. Various Lagriinae species have evolved special symbiont-bearing structures that serve to facilitate the vertical transmission of bacteria (24). Symbionts are stored extracellularly in two accessory glands associated with the female reproductive system, allowing contamination of the egg surface with symbionts (24). These symbionts later colonize three compartments forming from invaginations of the dorsal cuticle in the larva (24). Beetles are typically coinfected with multiple symbiont strains related to the plant pathogen Burkholderia gladioli that are secreted onto the surface of eggs as they are laid (24). In the species Lagria villosa, a South American soybean pest, at least one symbiotic B. gladioli strain (Lv-StA) has been cultured and is still capable of infecting plants (24). The same strain produces antibacterial and antifungal compounds that can protect the beetle's eggs from infection (24,25). This is consistent with the hypothesis that the B. gladioli symbionts evolved from plant pathogens to become beetle mutualists. However, in field collections of L. villosa, Lv-StA is found only sporadically and is never highly abundant (26). Instead, the most abundant strain identified is often the uncultured Lv-StB strain (26), which we previously found harbors a biosynthetic gene cluster (BGC) predicted to produce lagriamide, a defensive compound found in field egg collections (6). We previously found through metagenomic sequencing that the genome of Lv-StB was much smaller than that of Lv-StA, suggesting that it has undergone genome reduction (6). It would seem that, while L. villosa has multiple options for symbionts that produce potential chemical defenses, only a subset have specialized as obligate mutualists. The presence of multiple related strains in this system, with selective genome reduction of a single strain, could potentially shed light on why the genomes of some symbionts become reduced.
Here, we show that in the L. villosa microbiome, Lv-StB is uniquely undergoing be missing a significant number of core genes that are required even in symbionts with highly reduced genomes. We examined the presence of core genes in all metagenomic bins, as well as in B. gladioli Lv-StA for comparison (Table 1; see also Data Set S1D). Nine bins were close in size to the genome of their respective closest relatives, while maintaining most core genes, and are classified as "nonreduced," and 10 bins were small but also lacked a significant fraction of core genes and are classified as "incomplete." Only the Lv-StB genome can be classified as reduced, on the basis of its reduced size compared to its close relative (2.07 Mbp, 23.5%) and maintenance of most core genes (85.7%). It is worth noting here that the "core" genes missing from the Lv-StB genome include genes essential for translation (infB, trpS, the tRNA-Met gene, and tRNA-Gln gene; Data Set S1D), which is probably explained by incomplete assembly of the draft-quality genome. The high level of maintenance of core genes nevertheless suggests that the assembly is missing a relatively small amount of the genome and that this genome is reduced in size. The Lv-StB bin exhibited additional features of genomes undergoing reduction, namely, reduced GC% content compared to the B. gladioli reference genome (58.7% versus 67.9%) and a proliferation of pseudogenes accounting for 45.29% of the annotated open reading frames (ORFs) ( Fig. 1; see also Data Set S1C and E). Because of this, the Lv-StB genome exhibits low coding density (59.04%), and it also possesses a large number (n ϭ 159) of ORFs containing transposases. Both of these characteristics are hallmarks of symbionts in the early stages of genome reduction, whose genomes contain high numbers of pseudogenes and genome rearrangements (9). The Lv-StB genome also contains a low number of genes compared with its free-living relative, with 744 ORFs that are not pseudogenes, transposase genes, or hypothetical genes versus 4,778 such genes in B. gladioli Lv-StA. Diversity of biosynthetic gene clusters in the L. villosa microbiome. Because we had previously isolated the nonreduced B. gladioli Lv-StA strain from L. villosa (6) and had found it to produce protective compounds despite having sporadic distribution and low abundance in field-collected beetles, we asked whether BGCs were a unique feature of Lv-StB in the metagenome or whether other community members have the biosynthetic machinery for potential chemical defenses. AntiSMASH (34) searches  Fig. 2). This distribution suggests that although Burkholderiaceae appear to be an important reservoir of BGCs in the L. villosa egg microbiome, other groups have significant biosynthetic potential. Among the 126 BGCs detected in the metagenome and in the B. gladioli Lv-StA genome, there were 17 that were Ͼ50 kbp in length and were predicted to produce complex nonribosomal peptides or polyketides (Fig. 3). Two of these have been putatively assigned to production of the antibiotic lagriene in Lv-StA (24) and of the antifungal lagriamide in Lv-StB (6), whereas five other small molecules known to be produced by Lv-StA have been assigned to shorter BGCs (24,25). We compared the 126 identified BGCs using BIG-SCAPE (35) and found only 7 examples of BGCs occurring in multiple strains, indicating that the biosynthetic potential in the metagenome and B. gladioli Lv-StA is largely nonredundant. Taken together, the data suggest that there is a large amount of undefined biosynthetic potential for production of potentially defensive small molecules in L. villosa symbionts, beyond B. gladioli Lv-StA and Burkholderia sp. Lv-StB.
Divergence between Lv-StB and the closest free-living relative. We constructed a phylogenetic tree of metagenomic bins assigned to the genus Burkholderia as well as B. gladioli Lv-StA, on the basis of 120 marker genes (Fig. 4A). This showed that Burkholderia sp. Lv-StB is most highly related to the B. gladioli clade but is divergent from it. We calculated genome-wide ANI values for pairs of Burkholderia genomes and found that B. gladioli strains shared between 97% and 100% ANI whereas Burkholderia sp. Lv-StB shared at most 85.79% ANI with B. gladioli A1 (Data Set S1F). During genome reduction, symbionts are known to undergo rapid evolution due to the loss of DNA repair pathways (as found in the Lv-StB genome; see below) and the relaxation of selection (9), and so the divergence of Lv-StB from B. gladioli may have been accelerated relative to free-living lineages. Genome-reduced symbionts have often been vertically transmitted for evolutionary timescales and across host speciation events, and therefore it is possible to calculate evolution rates where related symbionts occur in hosts with known divergence times inferred from fossil records. Such estimates in insect symbionts range widely over 3 orders of magnitude, but more recent ant and  sharp-shooter symbiont lineages (established Ͻ50 Mya for "Candidatus Baumannia cicadellinicola," Blochmannia obliquus, Bl. pennsylvanicus, and Bl. floridanus) show high rates of divergence per synonymous site per year (dS/t) of between 1.1 ϫ 10 Ϫ8 and 8.9 ϫ 10 Ϫ8 (36) (the divergence rates used here can be found in Table S1 in the supplemental material). Because of the large number of pseudogenes in the Lv-StB genome, we reasoned that it is likely to be a recent symbiont and therefore used these rates to estimate times of divergence between Lv-StB and B. gladioli A1. We found a dS rate of 0.5486 per site between these genomes and calculated divergence times of 6.15, 8.55, 6.93, and 49.76 My rates for Bl. floridanus, Bl. pennsylvanicus, Bl. obliquus, and "Candidatus Baumannia cicadellinicola" divergence rates, respectively (Table S1) (36). We should note here that these time estimates are very approximate and are possibly overestimates as symbiont evolution rates are likely not constant, with particularly rapid evolution occurring during lifestyle transitions (11). The range of these estimated divergence times suggests that the common ancestor of Burkholderia sp. Lv-StB and B. gladioli existed after the evolution of symbiont-bearing structures in Lagriinae beetles (see below).
We then sought to quantify the conservation of genes in Lv-StB compared to 13 related B. gladioli strains by identifying homologous gene groups in the entire set of nonpseudogenes in these strains with OMA (37). This pipeline aims to identify ortholo- Gene cluster distribution (n = 16,799) Horizontal Transfer to a Genome-Reduced Symbiont ® gous groups (OGs) while discounting paralogs among the genes in a given set of genomes (38). Of the 1,388 genes in Lv-StB that are not pseudogenes (Data Set S1B), 497 were not included in any OMA orthologous group (see below). A crude analysis of the OMA groups in Lv-StB, B. gladioli Lv-StA, and B. gladioli A1 revealed that Lv-StB retains a small subset of the groups found in both Lv-StA and A1 and has few unique groups (Fig. 4B), suggesting that Lv-StB has lost many of the genes conserved in B. gladioli. Consistent with this notion, we visualized the pangenome of B. gladioli and Lv-StB with Roary (39) (Fig. 4C) and found a large number of gene clusters that are conserved in B. gladioli but not Lv-StB. The gene clusters that are more variable among B. gladioli are also generally not found in Lv-StB. Conversely, there were gene clusters found in Lv-StB that are not present in B. gladioli strains, and these clusters may have been obtained by horizontal transfer after the divergence of Lv-StB or, alternatively, were lost in B. gladioli.

Shared homologs
Remarkably, among the 1,149 pseudogenes detected in the Lv-StB genome, 976 were hypothetical and 129 were transposases, leaving only 44 that were recognizable (Data Set S1G). Pseudogenes were detected by identifying those that appeared truncated more than 20% compared to the closest BLAST hit (see Materials and Methods). While this methodology does not attempt to compare lengths to a consensus gene length (for example, to account for start site annotation variance) and while a firm cutoff for pseudogene length truncation has not been defined, the low number of recognizable pseudogenes suggests that we are not overannotating pseudogenes. The recognizable set of pseudogenes included a number of important genes in the categories noted to be depleted as described below. For instance, the DNA polymerase I (Pol I) gene (polA) appears to have been disrupted by a transposase, which is now flanked by two DNA Pol I fragments (E5299_1120 and E5299_01122). Likewise, the uvrC gene (E5299_00503), a component of the nucleotide excision repair system (40), is also present as a truncated gene adjacent to a transposase gene. There were also pseudogenes involved in the Entner-Doudoroff and glycolysis energy-producing pathways (corresponding to phosphogluconate dehydratase [41] and glucokinase [42]), as well as purine biosynthesis (corresponding to phosphoribosylglycinamide formyltransferase [43] and phosphoribosylformylglycinamidine synthase [44]). Interestingly, we found two pseudogenes that negatively regulate cell division and translation. Septum protein Maf is a nucleotide pyrophosphatase that has been shown to arrest cell division, especially after transformation or DNA damage (45). Deletion of the Escherichia coli gene for homolog YhdE increased the growth rate, while overexpression decreased the growth rate (46). Therefore, the loss of Maf in Lv-StB would be expected to increase the rate of cell division and reduce the conversion of nucleotides, which it probably obtains from the host, to the monophosphates. The gene for the energy-dependent translational throttle protein EttA was also found to be truncated. This protein slows translation through interacting with the ribosome in both the ATP-and ADP-bound forms (47,48). Under energy-depleted conditions (i.e., high ADP levels), EttA was found to stabilize ribosomes and prevent commitment of metabolic resources; thus, the deletion mutant displayed reduced fitness during extended stationary phase (47). However, under circumstances where the host supplies ample nucleotides, the loss of ettA would be expected to increase translation rates.
Degradation of primary metabolic pathways in Burkholderia sp. Lv-StB. While the Lv-StB genome is of draft quality and thus may be missing some genes due to poor assembly, we found pervasive metabolic gaps across functional categories (Data Set S1A). Lv-StB is deficient in many metabolic pathways that are complete in related B. gladioli strains (Fig. 5), including the glyoxylate shunt (49) and various carbon degradation pathways as well as sulfur and nitrogen metabolism. Lv-StB appears incapable of making any of the following compounds due to the absence of several biosynthesis genes: thiamine, riboflavin, nicotinate, pantothenate, vitamin B12, and biotin. Likewise, there were deficiencies in amino acid biosynthesis (see Fig. S1 in the supplemental material). We predict that Lv-StB would be able to make chorismate, isoleucine, leucine, ornithine, proline, and threonine but likely lacks the ability to make aromatic amino acids, serine, methionine, lysine, histidine, cysteine, glutamine, and arginine due to the absence of several key genes in these pathways. The genome of Lv-StB also lacks genes involved in chemotaxis and flagella, suggesting that after the symbiont mixture is spread on eggs, the colonization of the dorsal cuticular structures in the embryo (24) does not require symbiont motility. Interestingly, the Lv-StB genome includes a trimeric autotransporter adhesin (TAA) related to SadA (50), which is involved in the pathogenicity of Salmonella enterica serovar Typhimurium by aiding cell aggregation, biofilm  Horizontal Transfer to a Genome-Reduced Symbiont ® formation, and adhesion to human intestinal epithelial cells. TAAs are found in Proteobacteria and consist of anchor, stalk, and head domains, among which the head forms the adhesive component (51). The bacterial honey bee symbiont Snodgrassella alvi is hypothesized to utilize TAAs in combination with other extracellular components during colonization of the host gut (52), and similar genes were identified in S. alvi symbionts in bumble bees and are predicted to perform a similar role (53). Therefore, this gene may play a role in the adhesion of Lv-StB cells to L. villosa eggs. The Lv-StB genome is also missing several enzymes involved in glycolysis, most notably phosphoglycerate kinase and phosphoenolpyruvate carboxykinase (among others), which would suggest that Lv-StB has lost the ability to perform glycolysis. The loss of glycolysis is often compensated for by an alternative pathway, such as the pentose phosphate or Entner-Doudoroff pathway (54). This is not the case for Lv-StB, where both the glucose-6-phosphate dehydrogenase and 6-phosphogluconolactonase genes appear to be missing from the genome. The citrate cycle is largely complete, except that it is missing pyruvate carboxylase, the enzyme that converts pyruvate to oxaloacetate. However, phosphoenolpyruvate carboxylase is present in Lv-StB (E5299_00983) and may alternatively be used for the production of oxaloacetate from exogenous phosphoenolpyruvate (55), as alternative pathways for the supply of oxaloacetate are also incomplete. Lv-StB is also missing all genes required to form cytochrome c oxidase and cytochrome b-c complexes. However, all genes required to encode NADH:quinone oxidoreductase (complex I), succinate dehydrogenase (complex II), and cytochrome o ubiquinol oxidase are present, along with all genes required for the F-type ATPase. The lack of complex III would likely result in a decreased rate of ATP production in Lv-StB as observed in fungi with alternative oxidases that bypass complexes III and IV (56). Lv-StB may be similar to the psyllid endosymbiont "Candidatus Liberibacter asiaticus," which has lost both key glycolysis genes and key glyoxalase genes and instead relies on the scavenging of ATP from the host (57).
We also found many deficiencies in both the de novo and salvage nucleotide pathways (Fig. S2). In the pyrimidine biosynthetic pathway, genes for dihydrooratase (pyrC) (58) and orotate phosphoribosyl transferase (pyrE) (59) were missing, suggesting that Lv-StB cannot produce orotate from N-carbamoylaspartate and cannot create nucleotides from free pyrimidine bases. Ribonucleotide reductase (nrdAB) (60) and thymidylate synthase (thyA) (61) are present, suggesting that deoxypyrimidine nucleotides can be made from CTP. The deficiencies in purine synthesis were more profound. The majority of the genes involved in the de novo pathway (62) were missing (purCDEFHLMNT, IMP dehydrogenase/guaB), except for those encoding adenylosuccinate lyase (purB), adenylosuccinate synthase (purA), and GMP synthase (guaA). Lv-StB should therefore be able to make AMP from IMP and GMP from XMP (plus their deoxy analogs through ribonucleotide reductase) but cannot make purines de novo. We were also not able to find adenine phosphoribosyltransferase or hypoxanthine-guanine phosphoribosyltransferase (62), meaning that purine bases cannot be salvaged to make nucleotides.
The loss of so many seemingly vital pathways would suggest that several key nutrients are acquired from either the host or other microbes. There are a total of 56 intact genes associated with transport in the Lv-StB genome (Table S2), of which 12 are annotated as efflux/export proteins. A total of 32 genes were found to be associated with ABC-type transporters and are predicted to be involved in the transport of carbohydrates, lipids, lipoproteins, cations, sugars, and methionine. Other genes were also found that encoded transporters that were predicted to transport biopolymers, potassium, osmoprotectants, and fructose. Finally, only broad identification could be achieved for the remaining transporters, including those within the major facilitator superfamily (MFS), the resistance-nodulation-cell division superfamily (RND), the drug/ metabolite transporter superfamily, and the HlyC/CorC family transporter. It is therefore possible that many nutrients that cannot be created de novo within Lv-StB, such as the amino acid methionine, may be acquired from external sources, such as is observed between Buchnera bacteria and their aphid hosts (63).
Degradation of DNA repair pathways in Burkholderia sp. Lv-StB. The genome of Lv-StB is missing many genes involved in DNA repair (Fig. 5B), similarly to other examples of genome-reduced symbionts (9). Compared to closely related B. gladioli strains, Lv-StB lacks genes in every repair pathway. In particular, the DNA polymerase I gene (polA), involved in homologous recombination, nucleotide excision repair, and base excision repair, is present only as two truncated pseudogenes (see above). Even though polA is involved in many different DNA repair pathways, it has been found to be nonessential in E. coli (64, 65), B. pseudomallei (66), and B. cenocepacia (67). In the homologous recombination pathway, Lv-StB lacks recA, polA, ruvA, ruvB, ruvC, and recG, all of which have been found to be essential for homologous recombination in E. coli (68). Likewise, Lv-StB is also missing ku and ligD, the two components of the nonhomologous end-joining pathway (69), suggesting that it cannot recover from doublestrand breaks. Lv-StB lacks several DNA glycosylases in the base excision repair pathway which are responsible for removing chemically modified bases from double-stranded DNA (70). Some of these losses in Lv-StB simply reduce redundancy, but it has also lost the nonredundant glycosylase genes mutM and mug. The former recognizes 2,6diamino-4-hydroxy-5-N-methylformamidopyrimidine (Fapy) and 8-hydroxyguanine (71), while the latter recognizes G:U and G:T mismatches (72) as well as epsilonC (73), 8-HM-epsilonC (74), 1,N(2)-epsilonG (75), and 5-formyluracil (76). Finally, in the mismatch repair system, Lv-StB is missing mutS, which is required for the recognition of mismatches in methyl-directed repair (77). In summary, Lv-StB is likely to be completely incapable of nonhomologous end-joining, homologous recombination, and mismatch repair, while being impaired in nucleotide excision repair and base excision repair due to the loss of DNA polymerase I and several DNA glycosylases.
Timing of horizontal acquisition of defensive and other genes in the Lv-StB genome. We then attempted to identify genes in the Burkholderia sp. Lv-StB genome that are likely to have been acquired by horizontal gene transfer (HGT). A total of 497 intact genes were identified as unique to Lv-StB among 13 reference B. gladioli genomes (see Materials and Methods). Of these 497 genes, genes with no homologs in the NR database or genes that had homologs in B. gladioli genomes not included in this study were removed. A total of 148 genes appeared to be more closely related to species other than B. gladioli and may have been acquired through horizontal transfer (Fig. 6). A total of 79 and 57 genes appear to have been obtained from gammaproteobacteria and alphaproteobacteria, respectively, with one gene from a firmicute, one gene from a cyanobacterium, and three genes from phages (see Data Set S1G). The distribution is consistent with the notion that horizontal transfer occurs most frequently between closely related species (78). In particular, Burkholderiaceae species were the most frequent apparent donors of gammaproteobacterial proteins. Interestingly, the alphaproteobacterial genus Ochrobactrum (family Brucellaceae in the NCBI taxonomy and Rhizobiaceae in GTDB) was a major putative gene donor. This genus includes several symbionts of termites (79), army worms (80), weevils (81), and leeches (82,83). In previous 16S amplicon investigations of Lagria beetles, Ochrobactrum spp. were often found (24, 26) (Fig. S3), suggesting that the donors of these genes could have also been associates of L. villosa. Ochrobactrum operational taxonomic units (OTUs) account for 5% to 20% of 16S rRNA gene reads, but this genus was not observed in the shotgun metagenome. However, disparities between 16S and shotgun metagenome abundances are not uncommon due to variable 16S copy numbers and primer and sequencing biases (84,85). On the basis of the evidence for putatively horizontally transferred genes, we asked whether these could have contributed to the dominance of Lv-StB in L. villosa and set out to estimate the timings of horizontal transfer events.
Horizontally transferred genes are often detected on the basis of nucleotide composition differing from that of other genes in the genome (86). Such genes initially exhibit nucleotide composition consistent with the donor genome, which eventually normalizes to the composition of the recipient genome (87). The rate of this "amelioration" process (ΔGC HT ) has been modeled by Lawrence and Ochman (87)  We identified groups of consecutive genes in the putative HGT set that could have been acquired together and used the method described above to estimate their introgression time (Data Set S1H). Among the 18 identified gene groups, 7 were found to have atypical GC content (defined by Lawrence and Ochman [87]), with GC% content that was either Ͼ10% lower or Ͼ8% higher at the first codon position or the third codon position, respectively, than that of the genome as a whole. The method described above was used to estimate the time of introgression for each gene group, using the Bl. floridanus, Bl. pennsylvanicus, Bl. obliquus, and "Ca. Baumannia cicadellinicola" divergence rates (see above and Fig. 7; see also Data Set S1H). The oldest horizontal transfer (1.79 to 14.36 Mya) was found to correspond to the "hypo4" group (E5299_02488 -E5299_02489), representing two phage proteins, and the next oldest corresponded to the lagriamide BGC and neighboring genes (lga; 0.8 to 6.42 Mya; E5299_00001-E5299_00003, E5299_00005-E5299_00018). The lga BGC is predicted to have come from a high-GC organism, with original GC content of 72%. The closest relatives of many of the lga genes are found in Pseudomonas strains, which typically do not have GC content this high. However, as BGCs are often thought to be horizontally transferred (89)  Introgression time (Mya) FIG 7 Estimated introgression time for putative HGT gene sets, using the "Candidatus Baumannia cicadellinicola" substitution rates (36). Note that for clarity, the gene sets with estimated introgression time of Ͻ5,000 ya are not labeled. For these data and ages estimated with other substitution rates, see Data Set S1H. Abbreviations refer to groups of genes as outlined in Data Set S1H: acon, E5299_02570 -E5299_2571; addic, E5299_02248 -E5299_02250; meth, E5299_02474 -E5299_02475; ochro, E5299_01926 -E5299_01933; par, E5299_02349 -E5299_02352; phd, E5299_02420 -E5299_ 02423; porfo, E5299_02300, E5299_02302-E5299_02303; rep, E5299_02219 -E5299_02220; rhh, E5299_01720, E5299_01726; ura, E5299_02533-E5299_02435.

Ra lsto nia sol an ace aru m Va rio vo ra x sp . O K 60 5 C a b a lle ro n ia a rv i C a b a lle ro n ia s c h u m a n n ia n a e D ia p h o ro b a c te r n it ro re d u c e n s P a ra b u rk h o ld e ri a _ B c a b a ll e ro n is S im p li c is p ir a m e ta m o r p h a E x te n s im o n a s v u lg a r is A c id o v o r a x _ B S e r p e n t in o m o n a s r a ic h e i C a s t e ll a n ie ll a c a e n i
Horizontal Transfer to a Genome-Reduced Symbiont ® nucleosides, trehalose, and vitamin B12. Of these, the sugar_transp and bmp groups, predicted to be involved in importation of trehalose and purine, respectively, are relatively old (0.41 to 3.31 My and 0.27-2.16 My, respectively), and the groups likely involved in B12 importation (dinj, tonb, tonb2) are estimated to be Ͻ5,000 y old. Trehalose is the most abundant component of insect hemolymph, and was found to be provided by the host to its genome-reduced symbiont in a leaf beetle system (90). We also found HGT gene groups putatively involved in heat shock response (hsp20; 0.05 to 0.  (91), a protein used in the SOS response that binds to abasic sites in single-stranded DNA. The transferred transport functions and DNA repair proteins match functional categories that are currently lacking genes due to genome reduction (see above), and therefore the transfers could have been contemporaneous with the reduction process, acting as mechanisms compensating for lost functions.
One putative HGT toxin that is unique to Lv-StB in the metagenome and among B. gladioli strains is zonular occludens toxin (zot [E5299_02544 -E5299_02545]; estimated to have been acquired Ͻ5,000 ya). The zot gene is responsible for the production of the zonula occludens toxin, a virulence factor which was initially identified in Vibrio cholerae and whose presence was found to lead to the disassembly of intracellular tight junctions and consequently to increased permeability of mammalian epithelium (92). Colocalized with the predicted zot gene was a gene encoding DUF2523, which we found often accompanied zot in our searches of the STRING database (93). Zot proteins have been identified in several strains of Campylobacter and have been shown to elicit an inflammatory response in intestinal epithelial cells (94,95). Furthermore, a significant correlation was found between the presence of the Zot protein and hyperinvasive strains of Neisseria meningitidis (96). Potentially, zot may aid in the infection of the L. villosa embryonic structures through increasing permeability across the outer layers of the egg, although this remains speculative.
The evolution of Burkholderia sp. Lv-StB. In the 1920s it was observed (97) that beetles in the Lagria and Cerogria genera possessed structures now known to harbor Burkholderia symbionts in L. villosa and L. hirta (26). Other genera in the Lagriinae subfamily, such as Adynata and Arthromacra, do not have these structures. According to the tree found at timetree.org (98), Lagria and Cerogria diverged 55 Mya, and the common ancestor of Lagria, Cerogria, and Adynata existed 82 Mya (this region of the tree utilizes data from Kergoat et al. [99]). On the basis of these estimates, the symbiont-bearing structures in Lagria and Cerogria likely evolved between 82 and 55 Mya. Our analysis suggests that the divergence of Burkholderia sp. Lv-StB from B. gladioli occurred after that point (6.15 to 49.76 Mya). During that time, the genome of Burkholderia sp. Lv-StB became reduced, and it is likely dependent on the host due to deficiencies in energy metabolism and nucleotide biosynthesis. Notably, the profound metabolic insufficiencies and incomplete DNA repair pathways in Lv-StB are typical of symbionts with smaller genomes, such as "Ca. Endolissoclinum faulkneri," an intracellular tunicate symbiont with a 1.48-Mbp genome and a similar number of genes (783) (100), estimated to have been a symbiont for at least 6 to 31 My (101). While the presence of Burkholderia sp. Lv-StB and of its defensive compound lagriamide has been shown to decrease the rate of fungal egg infection (6), the symbiont is not essential for beetle reproduction (24). Therefore, the relationship is facultative from the perspective of the host, while Lv-StB is in the process of becoming dependent on L. villosa. A central issue that we aimed to address in this work was that of how the genome of Lv-StB became reduced, when L. villosa appears to maintain multiple other nonreduced Burkholderia spp. and other symbionts.
It is clear from previous work that the Lagria symbionts related to B. gladioli evolved from plant-associated strains (26), likely transmitted to the insects from the plant environment. Intriguingly, there are closely related Burkholderia (now formally Caballeronia) symbionts of several plant species within the Rubiaceae and Primulaceae families which reside within the host's leaf nodules (102) and have genomes that are similar in size to that of Burkholderia sp. Lv-StB. In "Ca. Burkholderia crenata," hosted by the plant Ardisia crenata, there are a number of interesting parallels with Burkholderia sp. Lv-StB, such as the maintenance of secondary metabolite BGCs and loss of essential genes and DNA repair functions (103). Interestingly, lateral gene transfers of the secondary metabolite BGCs between different symbionts have occurred, likely after transition to a host-restricted lifestyle (104). It is possible that the evolution of the L. villosa symbiont resembles that of the leaf nodulating Burkholderia symbionts given the similarities in function, genome characteristics, and degrees of host dependence, although "Ca. Burkholderia crenata" has the smallest genome of leaf nodule symbionts and, in contrast to Lv-StB, maintains the capability of synthesizing most cofactors and vitamins.
For the Lagria host, the probable advantage of the consistent presence of Burkholderia or other bacteria is that of protection of eggs from infection, through the activity of small molecules made by its microbiome. The strains characterized here, as well as the previously isolated Lv-StA strain, were found to contain ample biosynthetic potential, and both Lv-StA and Lv-StB produce antifungal compounds that were found to protect eggs from fungal infection in laboratory experiments (26). And yet, Lv-StA is found only sporadically as a minor component of the microbiome in the field (6). It is probably advantageous for Lagria beetles to maintain a pool of facultative symbionts with different biosynthetic capabilities to allow fast adaptation to different environmental infection pressures (105). However, there may be less selection pressure on a facultative symbiont to stay associated with its host if it can also survive in the environment and infect plants.
The foundational event in the establishment of the symbiosis between Lv-StB and L. villosa was likely the acquisition of the lga pathway, which putatively produces lagriamide, in a nonreduced ancestor of Lv-StB (6). We place this as the first event for four reasons. First, the lga BGC is almost the oldest detectable horizontal transfer that survives in the reduced genome of Lv-StB. Second, we found little evidence that Lv-StB is capable of making metabolites of use to the host, indicating that the symbiosis is likely not based on nutrition. L. villosa's diet of plant leaves may be nitrogen poor, with hard-to-digest plant cell wall components (106), but we did not find polysaccharidedegrading pathways or pathways corresponding to extensive biosynthesis of essential amino acids in the Lv-StB genome. Therefore, the lga BGC is the oldest remaining feature that potentially increases host fitness. Third, the reduced coding density seen in the Lv-StB genome may be indicative of a recent transitional event (11), such as strict host association or a move to vertical transmission. Fourth, even though we found genes missing from all DNA repair pathways, which is thought to be a driver for increased AT content in symbiont genomes (9), and increases in AT content may have an adaptive component that reduces the metabolic costs associated with symbionts (23), the GC content of the Lv-StB genome is not very different from that of free-living B. gladioli strains in comparison to other "transitional" symbionts. For instance, "Candidatus Pantoea carbekii" has a reduced genome and, like Lv-StB, lacks full-length DNA polymerase I (107). The GC content of this strain is almost 30% lower than that of its closest free-living relatives (11), while the GC content of the genome of Lv-StB is ϳ10% lower than that of its closest relatives. Therefore, we propose that loss of DNA repair pathways and other genome degradation events in Lv-StB occurred relatively recently, after the acquisition of lga.
It can be envisioned that lga provided a sustained survival advantage in an environment where lagriamide reduced egg fungal infections and that there was positive selection in beetles that vertically transmitted lga-bearing symbionts. It is also possible, although speculative, that production of lagriamide confers a competitive advantage among the members of the microbial community in the beetle. In L. villosa, symbionts are stored extracellularly, and they are spread onto the exterior of eggs as they are laid. According to observations in the congeneric species L. hirta, the symbionts are assumed to first enter through the egg micropyle to reach the embryonic organs where they are housed throughout larval development (97). It is thus likely that only a few of the cells are vertically transmitted as a result of colonizing these structures, potentially providing the population bottlenecks that could have caused initial accumulation of deleterious mutations that started the process of genome reduction. Meanwhile, loss of certain proteins limiting growth rate (see above) may have been selected through the presence of increased Lv-StB populations and compound production. It is unknown to what extent Lv-StB is genetically isolated in the larval or adult host, but we found evidence of ongoing horizontal transfer events having occurred in the recent past, presumably through contact with a complex microbiome associated with L. villosa egg surfaces. These horizontal gene transfers likely happened concurrently with the ongoing genome reduction process and may have been compensatory for gene losses. There is some precedence for the idea of the existence of extracellular symbionts with profoundly reduced genomes (106,(108)(109)(110). For instance, the leaf beetle Cassida rubiginosa harbors "Candidatus Stammera capleta," a symbiont with the smallest genome of any extracellular organism (0.27 Mbp), which provides pectinolytic enzymes to help break down the host's leafy diet (106). In many of these cases, symbionts are stored as isolated monocultures within specialized structures in adult hosts, while vertical transmission is assisted by packaging symbiont cells in protective "caplets" attached to eggs (106) or in a "symbiont capsule" encased in chitin (108) or by their secretion in a galactan-based jelly ingested by hatched juveniles (110), although reduced-genome symbionts have also been known to be vertically transmitted by simple egg surface contamination (109), as in L. villosa. Because these examples represent advanced cases of genome reduction, it would be difficult to determine whether horizontal transfer events occurred before or during genome reduction, and none have been noted. The complexity of the L. villosa microbiome appears to represent a different case, as it afforded ample opportunity for horizontal gene transfer even while the genome of Lv-StB was actively undergoing reduction.
In addition to the leaf nodule Burkholderia symbionts (see above), horizontal acquisition of genes has been observed in two types of reduced-genome symbionts, namely, eukaryotic parasites in the genus Encephalitozoon (111) and Acetobacteraceae strains associated with the gut community of red carpenter ants (112). In both of these cases, symbiont genomes were similar in size to Lv-StB (ϳ2 Mbp) but showed far greater coding density and fewer pseudogenes. Furthermore, both the Encephalitozoon and Acetobacteraceae strains were culturable, suggesting that they are facultative symbionts in a less advanced state of genome reduction than Lv-StB. The genome of Lv-StB appears to be different from the genomes of these examples, because there is evidence of recent horizontal transfers, even though genes required for homologous recombination are currently missing. Either the loss of homologous recombination was very recent or such transfers could have occurred in a RecA-independent manner. For example, plasmids could have been transferred into Lv-StB cells, followed by the RecA-independent transposition of genes to the chromosome (113,114). Another potential mechanism is phage infection. For instance, it is known that the genomes of symbionts that switch hosts such as Wolbachia strains tend to contain a large amount of mobile DNA and phage remnants (115).
Conclusions. It is unclear whether Lv-StB will continue on the path of genome reduction to become drastically reduced with a Ͻ1-Mbp genome. Where symbionts are required for host survival and are genetically isolated within host cells or specialized structures, such a process appears to be irreversible and unstoppable (116,117). However, a number of alternate fates could be envisioned for Lv-StB. With a complex microbiome, if ongoing gene losses in Lv-StB reduce its fitness past a certain point, then it could be replaced by another strain, potentially accompanied by horizontal transfer of the lga pathway to a less reduced genomic chassis. Alternatively, horizontal transfers of genes to Lv-StB could lead to an equilibrium of gene loss and gain. Interestingly, it appears that until the present time, horizontal transfer had not occurred fast enough to prevent widespread loss of metabolism and DNA repair in the Lv-StB genome. The host could also evolve strategies to maintain increasingly genome-reduced Lv-StB, perhaps by selective extracellular partitioning and packaging for vertical transfer similarly to the examples outlined above. However, it is unclear whether such an evolutionary path would be favorable, given that the coinfection of multiple BGCbearing symbiont strains could be advantageous in environments with variable pathogen pressures.
In summary, evidence gathered here suggests that the introduction of the lagriamide BGC preceded and perhaps initiated genome erosion of Lv-StB, potentially through selection of beetles that transferred the symbiont vertically, leading to a population structure with frequent bottlenecks. Simultaneous advantageous gene acquisitions may have enabled the preferential survival of Lv-StB and its dominance in the adult host and the egg surface.

MATERIALS AND METHODS
Sequencing and assembly of Burkholderia gladioli Lv-StA genome. Genome sequencing of the isolated B. gladioli Lv-StA strain was carried out using PacBio with single-molecule, real-time (SMRT) technology. For de novo assembly (carried out by Eurofins Genomics), the HGAP pipeline was used (Heirarchical Genome Assembly Process). Briefly, a preassembly of long and accurate sequences was generated by mapping filtered subreads to so-called seed reads. Subsequently, the Celera assembler was used to generate a draft assembly using multikilobase long reads, which in this case rendered fullgenome closure. Finally, the Quiver algorithm was used to correct indel and substitution errors by considering the quality values from the bas.h5 files.
Metagenomic binning and annotation. Metagenomic assembly files were clustered into putative genomic bins using Autometa (Master branch-commit bbcea30) (28). Contigs with lengths shorter than 3,000 bp were excluded from the binning process, and a taxonomy table was produced. Contigs classified as bacterial were further binned into putative genomic bins using run_autometa.py. Unclustered contigs were recruited into clusters using ML_recruitment.py. Results were summarized using cluster_process.py. The resultant genome bins were compared to earlier versions (6) using Mash version 2.1.1 (118), which hashes genomes to patterns of k-mers (sketching), allowing rapid calculations of distances between two sketches. All bins were sketched, and distances were computed in a pairwise fashion. Pairwise distances were visualized in R as a dendrogram and enabled the determination of equivalent old and updated putative genome bins between analyses. The updated putative genomic bins were annotated using Prokka version 1.13 (119), with GenBank compliance enabled. Reference genomes downloaded from NCBI were similarly annotated with Prokka in order to maintain consistency between data sets. Amino acid sequences of open-reading frames (ORFs) were further annotated using DIAMOND blastp version 0.9.21.122 (120) against the Diamond formatted NR database. The search was limited to returning a maximum of 1 target sequence, and the maximum number of high-scoring pairs per subject sequence was set to 1. Results were summarized in BLAST tabular format with qseqid (Query gene identifier [ID]), stitle (aligned gene ID), pident (percentage of identical matches), evalue (expected value), qlen (query sequence length), and slen (aligned gene sequence length) as desired parameter output. Pseudogenes were identified by finding Lv-StB genes that were more than 20% shorter than their respective BLAST matches. This criterion has been used previously to identify pseudogenes (101,121).
Coding density was calculated as the sum of all protein-coding sequences (coding sequence) as a percentage of the sum of all contigs (total sequence). In cases where protein-coding genes were found to overlap, the length of the overlap region was counted only once. This calculation was performed for all binned genomes, both on the initial data sets (GenBank files generated during Prokka annotations) and on the edited data sets where pseudogenes had been removed. For the identification and count of genes encoding transposases and hypothetical proteins, protein-coding gene amino acid files (*.faa) containing Prokka annotations were parsed for gene descriptions containing "transposase" and "hypothetical" strings.
Taxonomic classification of genome bins. Putative genome bins clustered from the L. villosa metagenomic data set were taxonomically classified using GTDB-Tk v0.2.2 (reference database gtdbtk.r86_v2) with default parameters (29). GTDB-Tk identifies and aligns 120 bacterial marker genes per genome before calculating the optimal placement of the respective alignments in the precomputed GTDB-Tk reference tree which consists of 94,759 genomes (see Data Set S1B in the supplemental material). A species was assigned to a genome if it shared 95% or more ANI with a reference genome.
Deamelioration of putatively horizontally transferred genes. The method of Lawrence and Ochman (87) was implemented in Python and is available at https://bitbucket.org/jason_c_kwan/age _horizontal_transfers.py. The script takes as input (i) an in-frame nucleotide FASTA file containing the sequences of putatively horizontally transferred genes, (ii) an in-frame nucleotide FASTA file containing a comparison set of gene sequences from the genome, (iii) a synonymous mutation rate in substitutions per 100 sites per million years, (iv) a nonsynonymous mutation rate in substitutions per 100 sites per million years, (v) a transition/transversion ratio (), (vi) a step time in millions of years, and (vii) a maximum time to iterate to. The script outputs GC content of each codon position (plus overall GC) at each time point and reports the estimated age of the gene cluster corresponding to the iteration with the smallest sum of squared deviations from equations 2 to 4. The substitution rates used in our calculations were half of the divergence rates estimated by Silva and Santos-Garcia (36) and were as follows: for "Ca. Ba. cicadellinicola," synonymous rate of 0. 55  Microbial community analysis. 16S rRNA amplicon sequence data sets analyzed via Qiime (134) as described previously (24) were reanalyzed using Mothur v.1.40.3 (135). Reads shorter than 200 bp and reads containing ambiguous bases or homopolymeric runs longer than 7 bases were removed from the data set. Chimeric sequences were identified using VSEARCH (136) and removed from the data set. Reads were taxonomically classified against the Silva database (version 132), and all reads classified as unknown, eukaryotic, or mitochondrial or as representative of chloroplasts were removed from the data set. Reads were aligned using the Silva database (v. 132) as the reference and clustered into operational taxonomic units (OTUs) at a distance of 0.03, an approximation used for bacterial species. Counts of OTUs per sample were generated, and the top 10 most abundant OTUs were plotted (see Fig. S3 in the supplemental material). The top 50 most abundant OTUs were queried against the "nt" nucleotide database using BLAST for taxonomic classification.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.