Gene Flow between Divergent Cereal- and Grass-Specific Lineages of the Rice Blast Fungus Magnaporthe oryzae

ABSTRACT Delineating species and epidemic lineages in fungal plant pathogens is critical to our understanding of disease emergence and the structure of fungal biodiversity and also informs international regulatory decisions. Pyricularia oryzae (syn. Magnaporthe oryzae) is a multihost pathogen that infects multiple grasses and cereals, is responsible for the most damaging rice disease (rice blast), and is of growing concern due to the recent introduction of wheat blast to Bangladesh from South America. However, the genetic structure and evolutionary history of M. oryzae, including the possible existence of cryptic phylogenetic species, remain poorly defined. Here, we use whole-genome sequence information for 76 M. oryzae isolates sampled from 12 grass and cereal genera to infer the population structure of M. oryzae and to reassess the species status of wheat-infecting populations of the fungus. Species recognition based on genealogical concordance, using published data or extracting previously used loci from genome assemblies, failed to confirm a prior assignment of wheat blast isolates to a new species (Pyricularia graminis-tritici). Inference of population subdivisions revealed multiple divergent lineages within M. oryzae, each preferentially associated with one host genus, suggesting incipient speciation following host shift or host range expansion. Analyses of gene flow, taking into account the possibility of incomplete lineage sorting, revealed that genetic exchanges have contributed to the makeup of multiple lineages within M. oryzae. These findings provide greater understanding of the ecoevolutionary factors that underlie the diversification of M. oryzae and highlight the practicality of genomic data for epidemiological surveillance in this important multihost pathogen.

uted to the genetic makeup of multiple M. oryzae lineages within the same species. Plant health surveillance is therefore warranted to safeguard against disease emergence in regions where multiple lineages of the fungus are in contact with one another.
KEYWORDS cryptic species, disease emergence, diversification, fungal pathogen, gene flow, population structure, rice, speciation, species recognition I nvestigating population genetic structure in relation to life history traits such as reproductive mode, host range, or drug resistance is particularly relevant in pathogens (1,2). Knowledge of species, lineages, populations, levels of genetic variability, and reproductive mode is essential to answer questions common to all infectious diseases, such as the tempo, origin, and proximate (i.e., molecular) and ultimate (i.e., ecoevolutionary) causes of disease emergence and spread (3). Multilocus molecular typing schemes have shown that cryptic species and lineages within species are often more numerous than estimated from phenotypic data alone. Genomic approaches are emerging as a new gold standard for detecting cryptic structure or speciation with increased resolution, allowing fine-grained epidemiological surveillance and sciencebased regulatory decisions. The added benefits of whole-genome approaches include identifying the genetic basis of life history traits and better understanding of both the genomic properties that influence the process of speciation and the signatures of (potentially incomplete) speciation that are observable in patterns of genomic variability (4,5).
Many plant-pathogenic ascomycete fungi are host specific, and some of their life history traits have been shown to be conducive to the emergence of novel pathogen species adapted to new hosts (6,7). Investigating population structure within multihost ascomycetes thus offers a unique opportunity to identify the genomic features associated with recent host range expansions or host shifts. In this study, our model is Magnaporthe oryzae (synonym of Pyricularia oryzae) (8), a fungal ascomycete causing blast disease on a variety of grass hosts. Magnaporthe oryzae is well studied as the causal agent of the most important disease of rice (Oryza sativa), but it also causes blast disease on more than 50 cultivated and wild monocot plant species (9). This includes other cereal crops such as wheat (Triticum aestivum), barley (Hordeum vulgare), finger millet (Eleusine coracana), and foxtail millet (Setaria italica and Setaria viridis), as well as wild and cultivated grass hosts, including goosegrass (Eleusine indica), annual ryegrass (Lolium multiflorum), perennial ryegrass (Lolium perenne), tall fescue (Festuca arundinacea), and St. Augustine grass (Stenotaphrum secundatum) (10). Previous studies based on multilocus sequence typing showed that M. oryzae is subdivided into multiple clades, each found on only a limited number of host species, with pathogenicity testing revealing host specificity as a plausible driver of genetic divergence (11,12). More recently, comparative genomics of eight isolates infecting wheat, goosegrass, rice, foxtail millet, finger millet, and barley revealed deep subdivision of M. oryzae into three groups infecting finger millet or wheat, foxtail millet, and rice or barley (13,14). Subsequent analysis of genomic data from nine wheat-infecting isolates, two ryegrassinfecting isolates, and one weeping-lovegrass-infecting isolate subdivided lineages infecting only wheat on the one hand and wheat or ryegrass on the other hand and revealed an additional lineage associated with the weeping lovegrass strain (15). Together, these studies suggest a history of host range expansion or host shifts and limited gene flow between lineages within M. oryzae.
Magnaporthe oryzae isolates causing wheat blast represent a growing concern in terms of food security. This seed-borne pathogen can spread around the world through movement of seed or grain. Therefore, understanding the evolutionary origin and structure of populations causing wheat blast is a top priority for researchers studying disease emergence and for regulatory agencies. Wheat blast was first discovered in southern Brazil in 1985 (16), and the disease subsequently spread to the neighboring countries of Argentina, Bolivia, and Paraguay (17)(18)(19), where it represents a consider-able impediment to wheat production (20,21). Until recently, wheat blast had not been reported outside South America. In 2011, a single instance of infected wheat was discovered in the United States, but analysis of the isolate responsible revealed that it was genetically similar to a local isolate from annual ryegrass and, therefore, unlikely to be an exotic introduction from South America (22). More recently, in 2016, wheat blast was detected in Bangladesh (23). Unlike the U.S. isolate, strains from this outbreak resembled South American wheat blast isolates rather than ryegrass-derived strains (15,23), thereby confirming the spread of wheat blast from South America to Bangladesh.
It has recently been proposed that a subgroup of the wheat-infecting isolates, together with some strains pathogenic on Eleusine spp. and other Poaceae hosts, belongs to a new phylogenetic species, Pyricularia graminis-tritici, which is well separated from other wheat-and ryegrass-infecting isolates, as well as pathogens of other grasses (24). However, this proposed split was based on bootstrap support in a genealogy inferred from multilocus sequence concatenation, and genealogical concordance for phylogenetic species recognition (GCPSR [25,26]) was not applied. The observed lineage divergence appeared to be mostly driven by genetic divergence at one of 10 sequenced loci, raising questions on the phylogenetic support of this species.
The present study was designed to reassess the hypothesis that P. graminis-tritici constitutes a cryptic species within M. oryzae and, more generally, to infer population structure in relation to host of origin in this important plant pathogen. Using wholegenome sequences for 81 Magnaporthe isolates (76 M. oryzae isolates from 12 host plant genera, four Magnaporthe grisea isolates from crabgrass [Digitaria spp.], and one Magnaporthe pennisetigena isolate from Pennisetum sp.), we addressed the following questions: do M. oryzae isolates form distinct host-specific lineages and is there evidence for relatively long-term reproductive isolation between lineages (i.e., cryptic species) within M. oryzae? Our analyses of population subdivision and species identification revealed multiple divergent lineages within M. oryzae, each preferentially associated with one host plant genus, but refuted the existence of a novel cryptic phylogenetic species named P. graminis-tritici. In addition, analyses of gene flow revealed that genetic exchanges have contributed to the makeup of the multiple lineages within M. oryzae.

RESULTS
Reassessing the validity of the proposed P. graminis-tritici species by analyzing the original published data according to GCPSR. To test the previous delineation of a subgroup of wheat-infecting isolates as a new phylogenetic species, we reanalyzed the Castroagudin et al. data set (24), which mostly included sequences from Brazilian isolates. However, instead of using bootstrap support in a total-evidence genealogy inferred from concatenated sequences for species delineation, we applied the GCPSR test (25,26). This test identifies a group as an independent evolutionary lineage (i.e., phylogenetic species) if it satisfies two conditions: (i) genealogical concordance (the group is present in the majority of the single-locus genealogies) and (ii) genealogical nondiscordance (the group is well supported in at least one single-locus genealogy and is not contradicted in any other genealogy at the same level of support) (25). Visual inspection of the topologies and supports in each single-locus tree revealed that GCPSR condition 1 was not satisfied since isolates previously identified as belonging to the phylogenetic species P. graminis-tritici grouped together in only one maximum likelihood gene genealogy-the one produced using the MPG1 locus (see Fig. S1A in the supplemental material). The P. graminis-tritici separation was not supported by any of the nine other single-locus genealogies ( Fig. S1B to J).
Next, we used the multilocus data as input to the program ASTRAL with the goal of inferring a species tree that takes into account possible discrepancies among individual gene genealogies (27)(28)(29). The ASTRAL tree failed to provide strong support for the branch holding the isolates previously identified as P. graminis-tritici (Fig. S2). Thus, analysis of the Castroagudin et al. data according to GCPSR standards failed to support the existence of the newly described P. graminis-tritici species. due to small sample sizes-only lineages with n Ͼ 6 were included) ( Table 2). The null hypothesis of no recombination could be rejected in the Lolium, wheat, rice, and Setaria lineages using the pairwise homoplasy test implemented in the SplitsTree 4.13 program (31) (P value, 0.0) ( Table 2). Genome-wide nucleotide divergence was 1 order of magnitude higher between M. oryzae and its closest relatives, M. grisea and M. pennisetigena, than it was among isolates within M. oryzae. The maximum pairwise distance (number of differences per kilobase) between any two M. oryzae isolates was less than 1%, genome-wide ( Fig. S4  provides good evidence against the existence of relatively ancient cryptic species within M. oryzae (Table S1). Reassessment of P. graminis-tritici as a novel species using whole-genome data. While the 10 loci utilized in the Castroagudin et al. (24) study do not support the P. graminis-tritici split based on GCPSR criteria, our DAPC and whole-genome ML and NJ analyses supported the partitioning of wheat blast isolates into two, genetically distinct lineages: one consisting almost exclusively of wheat-infecting isolates and the other comprising largely Festuca-and Lolium-infecting isolates as well as a few wheatinfecting isolates ( Fig. 2 and 3). However, the Castroagudin et al. study did not include Festuca-and Lolium-infecting isolates, and genome sequences from this study are not available. Therefore, to test for possible correspondence between the proposed P. graminis-tritici species and the Lolium lineage (or indeed the Triticum lineage), we Divergence Population Genomics of Magnaporthe oryzae ® extended the 10-locus analysis to the M. oryzae genome sequences used in the present study. For reference, we included the multilocus data for 16 isolates from the Castroagudin et al. study (24), representing all the major clades from that study. Nine of the 10 loci were successfully recovered from 68 of our M. oryzae genome sequences. The remaining locus, CH7-BAC9, was absent from too many genome sequences and, as a result, was excluded from the analysis.
The nine concatenated loci produced a total-evidence RAxML tree in which very few branches had bootstrap support greater than 50% (Fig. 4). All of the P. graminis-tritici  isolates from the Castroagudin et al. study were contained in a clade with 80% support. Inspection of the MPG1 marker that was reported to be diagnostic for P. graminis-tritici (Castroagudin et al. [24]) revealed that all of the isolates in this clade contained the P. graminis-tritici-type allele (green dots) and should therefore be classified as P. graministritici (Fig. 4). Critically, however, a few isolates outside this clade also harbored the P. graminis-tritici-type allele. Moreover, the clade also included isolates from the present Divergence Population Genomics of Magnaporthe oryzae ® study which came from wheat, annual ryegrass, perennial ryegrass, tall fescue, finger millet, and goosegrass-isolates that did not group together in the DAPC analysis ( Fig. 1) or in the ML and NJ trees built using the orthologous genes or whole-genome SNP data ( Fig. 2 and 3). Isolates carrying the P. graminis-tritici-type allele were in fact distributed among three genetically distinct and well-supported clades ( Fig. 2 and 3). Furthermore, visual inspection of the topologies and bootstrap supports for each single-locus tree revealed that GCPSR criteria were not satisfied for the clade including all of the P. graminis-tritici isolates from the Castroagudin et al. study. Thus, isolates characterized by Castroagudin et al. (24) as P. graminis-tritici fail to constitute a phylogenetically cohesive group based on total genome evidence, and thus, the existence of the P. graminis-tritici species is not supported by our new genome-wide data and analyses. The basis for the previous designation of P. graminis-tritici as a novel species was clearly revealed when MPG1 alleles were mapped onto the ML and NJ trees. The distribution of MPG1 alleles among different M. oryzae lineages was discontinuous (Fig. S5). As an example, isolates from the Triticum lineage carried three different MPG1 alleles. Two of these (including the P. graminis-tritici type) were also present in the Lolium lineage, while the third MPG1 (ACT17T-C-6CAA140, Fig. S5) was shared by distantly related isolates from the Stenotaphrum lineage (Fig. S5). Isolates from the Eleusine lineage also carried the P. graminis-tritici-type MPG1 allele and two other variants, while isolates from the Setaria and Oryza lineages carried an MPG1 allele distinct from all the others (Fig. S5). Overall, the distribution of MPG1 alleles points to the occurrence of incomplete lineage sorting and gene flow during M. oryzae diversification. Importantly, seven markers studied by Castroagudin et al.-including MPG1showed discontinuities in their distributions among lineages defined using genomewide data and analyses (Fig. S5). The two other markers (ACT1 and CHS1) used by Castroagudin et al. showed no sequence variations among the 68 M. oryzae isolates analyzed in the present study (data not shown) and are not useful for phylogenetic classification.
Species tree inference and phylogenetic species recognition from genomewide data. The total-evidence genealogies generated using sequence data from 76 M. oryzae genomes using either distance-based (whole genomes) or maximumlikelihood (2,682 single-copy orthologs) phylogenetic methods were highly concordant in terms of lineage composition and branching order ( Fig. 2 and 3). However, concatenation methods can be positively misleading, as they assume that all gene trees are identical in topology and branch lengths and they do not explicitly model the relationship between the species tree and gene trees (32). To estimate the species tree and to reassess previous findings of cryptic species within M. oryzae, we used a combination of species inference using the multispecies coalescent method implemented in ASTRAL (27)(28)(29) and a new implementation of the GCPSR that can handle genomic data.
The ASTRAL species tree with the local q1 support values on key branches is shown in Fig. 5. The four M. grisea isolates from crabgrass (Digitaria sp.) and the M. pennisetigena isolate from fountaingrass (Pennisetum sp.) were included as outgroups, bringing the total number of isolates to 81 and reducing the data set to 2,241 single-copy orthologous genes. The branches holding the clades containing the wheat blast isolates had q1 support values of 0.49, 0.39, and 0.37, which means that, in each case, fewer than 50% of the whole set of quartet gene trees recovered from the individual gene genealogies agreed with the local topology around these branches in the species tree. The branches that separated M. grisea and M. pennisetigena from M. oryzae had respective q1 values of 1, providing strong support for relatively ancient speciation. In contrast, the highest q1 value on any of the branches leading to the host-specialized clades was 0.8 for the Setaria pathogens, indicating that approximately 20% of the quartets recovered from individual gene trees were in conflict with the species tree around this branch. Together, these results indicate high levels of incomplete lineage sorting within, and/or gene flow involving, these groups and are thus inconsistent with the presence of genetically isolated lineages (i.e., species).
As a formal test for the presence of cryptic species within M. oryzae, we applied the phylogenetic species recognition criteria to the set of 2,241 single-copy orthologous genes using an implementation of the GCPSR scalable to any number of loci. Applying the GCPSR according to the nondiscordance criterion of Dettman et al. (a clade has to be well supported by at least one single-locus genealogy and not contradicted by any other genealogy at the same level of support) (25) resulted in the recognition of no species within M. oryzae.
Historical gene flow between lineages. The existence of gene flow and/or incomplete lineage sorting was also supported by phylogenetic network analysis. We used the network approach neighbor-net implemented in SplitsTree 4.13 (25) to visualize FIG 5 ASTRAL analysis to test for incomplete lineage sorting/gene flow among 81 Magnaporthe genomes, using 2,241 single-copy orthologous sequence loci. Thicker branches represent branches that have a bootstrap support of Ͼ50 after multilocus bootstrapping. Numbers above branches represent q1 local support (i.e., the proportion of quartet trees in individual genealogies that agree with the topology recovered by the ASTRAL analysis around the branch), with q1 values shown on black background for branches holding wheat blast isolates.
Divergence Population Genomics of Magnaporthe oryzae ® evolutionary relationships, while taking into account the possibility of recombination within or between lineages. The network inferred from haplotypes identified using the 2,682 single-copy orthologs in the 76 M. oryzae strains showed extensive reticulation connecting all lineages, consistent with recombination or incomplete lineage sorting (Fig. 6).
To disentangle the role of gene flow versus incomplete lineage sorting in observed network reticulations but also to gain insight into the timing and extent of genetic exchanges, we used ABBA/BABA tests, which compare numbers of two classes of shared derived alleles (the ABBA and BABA classes). For three lineages P1, P2, and P3 and an outgroup with genealogical relationships (((P1,P2),P3),O), and under conditions of no gene flow, shared derived alleles between P2 and P3 (ABBA alleles) and shared derived alleles between P1 and P3 (BABA alleles) can be produced only by incomplete lineage sorting and should be equally infrequent (34,35). Differences in numbers of ABBA and BABA alleles are interpreted as gene flow assuming no recurrent mutation and no deep ancestral population structure within lineages. We computed D, which measures the imbalance between numbers of ABBA and BABA sites and is used to test for admixture in ((P1,P2),P3) triplets, with D Ͼ 0 implying gene flow between P2 and P3 and D Ͻ 0 implying gene flow between P1 and P3 (34,35). We also made use of the heterogeneity in divergence time between members of ((P1,P2),P3) triplets to examine gene flow across three time periods (33), according to the following principles: (i) triplets including the most recently diverged lineages as P1 and P2 (i.e., the Triticum and Lolium lineages, the two Eleusine lineages, or the Oryza and Setaria lineages) carried information about gene flow across relatively recent times, (ii) triplets including as P1 and P2 two lineages from the same main group of lineages (i.e., Eragrostis/Eleusine1/ Eleusine2/Triticum/Lolium or Brachiaria2/Setaria/Oryza, excluding (P1,P2) pairs already used in principle 1) carried information about gene flow across intermediate times, and (iii) triplets including as P1 and P2 two lineages from different main groups of lineages (i.e., Eragrostis/Eleusine1/Eleusine2/Triticum/Lolium and Brachiaria2/Setaria/Oryza) and Stenotaphrum or Brachiaria1 as P3 carried information about gene flow across a relatively long time period (Fig. S6). The D statistic measuring differences in counts of ABBA and BABA alleles was significantly different from zero (Z-score Ͼ 3) in 104 of 120 lineage triplets, consistent with a history of gene flow between lineages within M. oryzae (Table S2). Given that a (P1,P2) pair can be represented as multiple ((P1,P2),P3) triplets and that the sign of D indicates what is the pair involved in gene flow within each triplet, the 104 triplets with significant D values in fact represented 35 pairs connected by gene flow, spanning the three time scales defined by the phylogenetic affiliation of lineages (Fig. S6). Lineages were equally represented in triplets deviating from null expectations assuming no gene flow, no ancient structure, and no recurrent mutations. Consistent with historical gene flow, searches for a private allele found no gene, among the 2,241 genes surveyed, carrying mutations exclusive to a single lineage. Together, these results indicate that gene flow was widespread, across both historical times and lineages, but it cannot be excluded that much of the signal was caused by events that happened prior to lineage splitting.
Recent admixture and gene flow between lineages. We then used the program Structure (36)(37)(38) to detect possible recent admixture between lineages (Fig. S3). Structure uses Markov chain Monte Carlo (MCMC) simulations to infer the assignment of genotypes into K distinct clusters, minimizing deviations from Hardy-Weinberg and linkage disequilibria within each cluster. The patterns of clustering inferred with Structure were largely similar to those inferred with DAPC. Structure analysis provided evidence for admixture at all K values (Fig. S3), suggesting that recent admixture events have recently shaped patterns of population subdivision within M. oryzae. "Chromosome painting," a probabilistic method for reconstructing the chromosomes of each individual sample as a combination of all other homologous sequences (39), also supported the lack of strict genetic isolation between lineages (Text S1).

DISCUSSION
Population subdivision but no cryptic phylogenetic species. Using population and phylogenomic analyses of single-copy orthologous genes and whole-genome SNPs identified in M. oryzae genomes from multiple cereal and grass hosts, we provide evidence that M. oryzae is subdivided into multiple lineages preferentially associated with one host plant genus. Neither the reanalysis of previous data nor the analysis of new data using previous phylogenetic species recognition markers supports the existence of a wheat blast-associated species called P. graminis-tritici (24). Marker MPG1, which holds most of the divergence previously detected, does not stand as a diagnostic marker of the wheat-infecting lineage of M. oryzae when tested in other lineages. Previous conclusions about the existence of the cryptic species P. graminis-tritici also stem from the fact that available information on M. oryzae diversity had been insufficiently taken into account. In particular, isolates from the lineages most closely related to wheat strains (i.e., isolates from the Lolium lineage [11,12,15,22]) were not represented in previous species identification work (24). Using phylogenetic species recognition by genealogical concordance, we could not identify cryptic phylogenetic species, and thus, M. oryzae is not, strictly speaking, a species complex. As a consequence, Pyricularia graminis-tritici cannot-and should not-be considered a valid name for wheat-infecting strains, because (i) it refers to a subset of wheat-infecting strains, and quarantine on P. graminis-tritici alone would not prevent introduction of aggressive wheat blast pathogens, and (ii) it groups very aggressive wheat pathogens from South America and South Asia with Eleusine-infecting strains that are largely distributed in the world. Given the devastating potential of wheat blast disease, it is vital that accurate strain identification and species assignment can be carried out by plant health agencies in order to safeguard against importation and spread of the disease. Correct species assignment is therefore a critical consideration. Hence, although the formal rules of taxonomy would imply treating P. graminis-tritici as a synonym of Magnaporthe oryzae, we strongly recommend dismissal of P. graminis-tritici as a valid name to refer to wheat-infecting strains of M. oryzae.
Divergence Population Genomics of Magnaporthe oryzae ® Incipient speciation by ecological specialization following host shifts. Several features of the life cycle of M. oryzae are conducive to speciation by ecologic specialization following host shifts, suggesting that the observed pattern of population subdivision in M. oryzae actually corresponds to ongoing speciation. Previous experimental measurements of perithecium formation and ascospore production-two important components of reproductive success-suggested interfertility in vitro between most pairs of lineages with high levels of ascospore viability (40)(41)(42)(43). This suggests that intrinsic pre-or postmating reproductive barriers, such as assortative mating by mate choice or gametic incompatibility, and zygotic mortality, are not responsible for the relative reproductive isolation between lineages-which creates the observed pattern of population subdivision. Instead, the relative reproductive isolation between lineages could be caused by one or several pre-or postmating barriers (see Table 1 in reference 44), such as mating-system isolation or hybrid sterility (intrinsic barrier), or difference in mating times, difference in mating sites, immigrant inviability, or ecologically based hybrid inviability (extrinsic barriers).
Previous pathogenicity assays revealed extensive variability in the host range of M. oryzae isolates, in terms of both pathogenicity toward a set of host species and pathogenicity toward a set of genotypes from a given host (40,41). Indeed, extensive genetic analyses show that host species specificity in M. oryzae, similar to rice cultivar specificity, could be controlled by a gene-for-gene relationship in which one to three avirulence genes in the fungus prevent infection of particular host species (43,45,46). Loss of the avirulence genes would allow infection of novel hosts to occur. Additionally, host species specificity is not strictly maintained. Under controlled conditions, most lineages have at least one host in common (40), and strains within one lineage can still cause rare susceptible lesions on naive hosts (21,47). Moreover, a single plant infected by a single genotype can produce large numbers of spores in a single growing season (48), allowing the pathogen to persist on an alternative host even if selection is strong and promoting the rapid and repeated creation of genetic variation (6). Although some of these features appear to be antagonistic to the possibility of divergence by host specialization within M. oryzae, our finding that the different lineages within M. oryzae tend to be sampled on a single host suggests that ecologic barriers alone may in fact contribute to reduce gene flow substantially between host-specific lineages. Differences in the geographic distribution of hosts, for which the level of sympatry has varied-and still varies-in space and time, might also contribute to reduced gene flow between lineages infecting different hosts, although some level of sympatry at some time is required so that new hosts could become infected, triggering host range expansion or host shifting.
Mating within host (i.e., reproduction between individuals infecting the same host), and to a lesser extent mating system isolation (i.e., lack of outcrossing reproduction), may contribute to further reducing gene flow between M. oryzae lineages. The fact that mating in M. oryzae likely occurs within host tissues, such as dead stems (49), may participate in the maintenance of the different lineages by decreasing the rate of reproduction between isolates adapted to different hosts (6). Loss of sexual fertility also appears to have a role in lineage maintenance. The rice lineage, in particular, is single mating type and female sterile throughout most of its range, which would reduce the chance of outcrossing sex with members of other lineages (50). Our analyses rejected the null hypothesis of clonality in all lineages, but they provided no time frame for the detected recombination events. Population-level studies and experimental measurements of mating type ratios and female fertility are needed to assess the reproductive mode of the different lineages within M. oryzae in the field.
Interlineage gene flow. Several potential barriers contribute to reduce genetic exchanges between M. oryzae lineages (see above), but not completely so, as evidenced by signal of gene flow and admixture detected in our genomic data. We hypothesize that the lack of strict host specialization of the different lineages is a key driver of interlineage gene flow. Many of the grass or cereal species that are hosts to M. oryzae are widely cultivated as staple crops or widely distributed as pasture or weeds, including "universal suscepts" such as barley, Italian ryegrass, tall fescue, and weeping lovegrass (40), increasing the chance for encounters and mating between isolates with overlapping host ranges. These shared hosts may act as a platform facilitating encounters and mating between fertile and compatible isolates from different lineages, thereby enabling interlineage gene flow (51). Plant health vigilance is therefore warranted for disease emergence via recombination in regions where multiple lineages are in contact and shared hosts are present. This is particularly so given that once infection of a novel host has taken place (i.e., host shift or host range expansion), the fungus has the capacity to build inoculum levels very rapidly, facilitating spread of the disease over considerable distances. It is striking, for example, that wheat blast has, within a year, spread from Bangladesh into the West Bengal region of India, where it emerged in 2017 (http://openwheatblast.org/).
Conclusion. Using a population genomics framework, we show that M. oryzae is subdivided into multiple lineages with limited host range and present evidence of genetic exchanges between them. Our findings provide greater understanding of the ecoevolutionary factors underlying the diversification of M. oryzae and highlight the practicality of genomic data for epidemiological surveillance of its different intraspecific lineages. Reappraisal of species boundaries within M. oryzae refuted the existence of a novel cryptic phylogenetic species named P. graminis-tritici, underlining that the use of node support in total-evidence genealogies based on a limited data set in terms of number of loci and of range of variation in origin (geography and host) of isolates can lead to erroneous identification of fungal cryptic species. Our work illustrates the growing divide between taxonomy that "creates the language of biodiversity" (52) based on limited sets of characters and genomic data that reveals more finely the complexity and continuous nature of the lineage divergence process called speciation.
Genome sequencing and assembly. New genome data were produced by an international collaborative effort. Characteristics of genome assemblies are summarized in Table S3 in the supplemental material. For newly sequenced genomes provided by M.F. and B.V., sequences were acquired on a MiSeq machine (Illumina, Inc.). Sequences were assembled using the paired-end mode in Newbler V2.9 (Roche Diagnostics, Indianapolis, IN). A custom perl script was used to merge the resulting scaffolds and contig files in a nonredundant fashion to generate a final assembly. Newly sequenced genomes BR130 and WHTQ, provided by T.M., were sequenced using an Illumina paired-end sequencing approach at Ͼ50ϫ depth. Short reads were assembled de novo using Velvet 1.2.10 (56), resulting in a 41.5-Mb genome for BR130 with an N 50 of 44.8 kb and a 43.7-Mb genome for WHTQ with an N 50 of 36.2 kb. For newly sequenced genomes provided by D.S. and N.J.T., DNA was sequenced on the Illumina HiSeq 2500 sequencer, producing 100 nucleotide paired-end reads, except in the case of VO107, which was sequenced on the Illumina Genome Analyzer II, producing 36-base-paired-end reads. Reads were filtered using fastq-mcf and assembled de novo using Velvet 1.2.10 (56).
Orthologous gene identification in genomic sequences. Protein-coding gene models were predicted using Augustus V3.0.3 (57). Orthologous genes were identified in the 76 genomes of M. oryzae or in the data set including outgroups using ProteinOrtho (58). The v8 version of the 70-15 M. oryzae reference genome (59) was added at this step in order to validate the predicted sets of orthologs. Only orthologs that were single copy in all genomes were included in subsequent analyses. Genes of each single-copy ortholog sets were aligned using MACSE (60). Sequences from the lab strain 70-15 were removed and not included in further analyses due to previously shown hybrid origin (13). Only alignments containing polymorphic sites within M. oryzae strains were kept for further analyses. This resulted in 2,241 alignments for the whole data set and 2,682 alignments for the 76 M. oryzae strains.
Population subdivision and summary statistics of polymorphism and divergence. Population subdivision was analyzed using DAPC and Structure (30,(36)(37)(38), based on multilocus haplotype profiles identified from ortholog alignments using a custom Python script. DAPC was performed using the Adegenet package in R (13). We retained the first 30 principal components and the first 4 discriminant functions. Ten independent Structure runs were carried out for each number of clusters K, with 100,000 MCMC iterations after a burn-in of 50,000 steps.
Divergence Population Genomics of Magnaporthe oryzae ® Polymorphism statistics were computed using EggLib 3.0.0b10 (61) excluding sites with Ͼ30% missing data. Divergence statistics were computed using a custom perl script.
To infer total-evidence trees within the 76 M. oryzae strains (respectively within the 81 Magnaporthe strains), all sequences from the 2,682 (respectively 2,241) orthologous sequences were concatenated. The maximum-likelihood tree was inferred using RAxML (62) with the general time reversible (GTR)-gamma model, and bootstrap supports were estimated after 1,000 replicates.
Secondary data analysis. Species recognition based on multiple gene genealogies as described by Castroagudin et al. (24) was repeated according to the reported methods. The robustness of the P. graminis-tritici species inference was tested by reiterating the study, omitting one marker at a time. Individual genealogies were built using RAxML with the GTR-gamma model and 100 bootstrap replicates.
Inference of species tree using ASTRAL. The ASTRAL method (27)(28)(29) is based on the multispecies coalescent and allows taking into account possible discrepancies among individual gene genealogies to infer the species tree. Individual genealogies inferred using RAxML with the GTR-gamma model and 100 bootstrap replicates were used as input data for ASTRAL analysis. Local supports around branches were evaluated with 100-replicate multilocus bootstrapping using the bootstrap replicates inferred from each individual gene tree as input data and with local quartet supports (q1, obtained using the -t option of ASTRAL) that represent the proportion of quartets recovered from the whole set of individual gene trees that agree with the local topology around the branch in the species tree.
MPG1-based classification. The MPG1 hydrophobin sequence is described as being diagnostic for the P. graminis-tritici/M. oryzae species split (24). MPG1 sequences from one of each species (gene identifiers [GIs] KU952644.1 for P. graminis-tritici and KU952661.1 for M. oryzae) were used as BLAST (63) queries to classify isolates as either P. graminis-tritici or M. oryzae.
Signatures of gene flow and/or incomplete lineage sorting. A phylogenetic network was built using SplitsTree 4.13 (64), based on the concatenation of sequences at single-copy orthologs identified in M. oryzae, excluding sites with missing data, sites with gaps, singletons, and monomorphic sites. The null hypothesis of no recombination was tested using the PHI test implemented in SplitsTree.
ABBA/BABA tests. ABBA/BABA tests were performed using custom Python scripts. The D statistic measuring the normalized difference in counts of ABBA and BABA sites was computed using equation 2 in reference 35. Significance was calculated using the block jackknife approach (100 replicates, 1,000 SNP blocks), to account for nonindependence among sites.
Probabilistic chromosome painting. We used the Chromopainter program, version 0.0.4, for probabilistic chromosome painting. This analysis was based on biallelic SNPs without missing data identified in the set of 2,682 single-copy orthologs, ordered according to their position in the reference genome of the rice-infecting strain 70-15. We initially estimated the recombination scaling constant N e and emission probabilities () by running the expectation-maximization algorithm with 200 iterations for each lineage and chromosome. Estimates of N e and were then computed as averages across lineages, weighted by chromosome length, and rounded to the nearest thousand for N e (N e ϭ 5,000; ϭ 0.0009). The file recom_rate_infile detailing the recombination rate between SNPs was built using the Interval program in LDhat version 2.2 (65) based on the whole data set combining isolates from all lineages, with 10 repeats by chromosome to check for convergence. Estimated N e and values and the perchromosome recombination maps estimated using LDhat were then used to paint the chromosomes of each lineage, considering the remaining lineages as donors, using 200 expectation-maximization iterations. For each lineage and each chromosome, Chromopainter was run three times to check for convergence.
Phylogenetic species recognition. We used an implementation of the GCPSR scalable to genomic data (https://github.com/b-brankovics/GCPSR) (69). The method works in two steps. (i) Concordance and nondiscordance analysis produces a genealogy that has clades that are both concordant and nondiscordant across single-gene genealogies, with support value for each of the clades being the number of single-gene genealogies harboring the given clade at bootstrap support above 95%. (ii) Exhaustive subdivision places all the strains into the least inclusive clades, by removing clades that would specify a species within potential phylogenetic species. We kept only two outgroup sequences per gene (BR29, M. grisea; Pm1, M. pennisetigena) to ensure having the same isolate at the root of all genealogies (Pm1 isolate). Majority-rule consensus trees were produced from 100 outgrouped RAxML bootstrap replicates for all 2,241 genes. The concordance and nondiscordance analysis was carried out assuming 95 as the minimum bootstrap support value and a discordance threshold of 1. Exhaustive subdivision was carried out using a concordance threshold of 1,121.
Whole-genome alignment and tree building. A custom perl script was used to mask sequences that occur in multiple alignments when the genome is subjected to BLAST analysis against itself. The masked genomes were then aligned in a pairwise fashion against all other genomes using BLAST (63). Regions that did not uniquely align in each pair at a threshold of 1eϪ200 were excluded. SNPs were then identified for each pairwise comparison and scaled by the total number of nucleotides aligned after excluding repetitive and duplicate regions. This produced a distance metric of SNPs per megabase of uniquely aligned DNA. The pairwise distances were used to construct phylogenetic trees with the neighbor-joining method as implemented in the R package Analyses of Phylogenetics and Evolution (APE) (66).
Because alignments are in pairwise sets as opposed to a single orthologous set, assessment of confidence values by traditional bootstrapping by resampling with replacement is not possible. Instead, confidence values were assigned by creating 1,000 bootstrap trees with noise added from a normal distribution with a mean of zero and the standard deviation derived from the pairwise distances between or within groups.