Recombination in Streptococcus pneumoniae Lineages Increase with Carriage Duration and Size of the Polysaccharide Capsule

ABSTRACT Streptococcus pneumoniae causes a high burden of invasive pneumococcal disease (IPD) globally, especially in children from resource-poor settings. Like many bacteria, the pneumococcus can import DNA from other strains or even species by transformation and homologous recombination, which has allowed the pneumococcus to evade clinical interventions such as antibiotics and pneumococcal conjugate vaccines (PCVs). Pneumococci are enclosed in a complex polysaccharide capsule that determines the serotype; the capsule varies in size and is associated with properties including carriage prevalence and virulence. We determined and quantified the association between capsule and recombination events using genomic data from a diverse collection of serotypes sampled in Malawi. We determined both the amount of variation introduced by recombination relative to mutation (the relative rate) and how many individual recombination events occur per isolate (the frequency). Using univariate analyses, we found an association between both recombination measures and multiple factors associated with the capsule, including duration and prevalence of carriage. Because many capsular factors are correlated, we used multivariate analysis to correct for collinearity. Capsule size and carriage duration remained positively associated with recombination, although with a reduced P value, and this effect may be mediated through some unassayed additional property associated with larger capsules. This work describes an important impact of serotype on recombination that has been previously overlooked. While the details of how this effect is achieved remain to be determined, it may have important consequences for the serotype-specific response to vaccines and other interventions.

against a small minority of antigenic diversity in this organism. Following vaccination, the prevalence of NVTs in carriage, and to a lesser extent in IPD, has increased as the NVTs took advantage of the removal of their competitors in a process termed serotype replacement (11Ϫ13). An important component of serotype replacement has been the success of "capsule switch" vaccine escape variants (14). These variants are lineages that were previously associated with a vaccine serotype but have persisted in the postvaccine environment by the introduction of genes for an NVT capsule through homologous recombination (15). The known ability of the pneumococcus to undergo homologous recombination has also been associated with the acquisition of drug resistance (16,17) and numerous examples of capsule switch variants (15, 18Ϫ20). As a result of this recombination-mediated shuffling of serotype and genotype, when PCVs were introduced, the population already contained potential vaccine escape variants, some of which have subsequently become common and important causes of disease.
The polysaccharide capsule is known to influence multiple aspects of pneumococcal biology. Serotypes vary in their nasopharyngeal carriage rates (21), and their propensity to cause IPD was usually expressed as the "invasive potential" in order to account for the various exposures to different serotypes (22Ϫ24). Moreover, these properties have been linked to basic biochemical features of the capsule: those capsules with fewer carbons in the repeat unit of the polysaccharide tend to be thicker and exhibit a more negative surface charge (25) and are associated with a longer duration of carriage and a higher carriage prevalence-an outcome that may be linked to enhanced resistance to opsonophagocytic killing (21). Those serotypes more associated with virulence meanwhile, tend to have smaller capsules and shorter durations of carriage (21,22,24).
Despite the importance of both recombination and capsule to pneumococcal biology, whether serotypes vary in their recombination rates in nature and what drives it have not been extensively studied. Genomic studies have shown that different pneumococcal lineages plainly vary in the amount that recombination has contributed to their diversification (20,26) but have not linked this to the different serotypes associated with each lineage. These studies have also suggested that lineages that completely lack capsule may have a higher relative rate of recombination, suggesting that the capsule can impede the uptake of DNA (26). Support for this comes from the observation that mechanisms such as biofilm formation and downregulation of capsule biosynthesis have been reported to facilitate DNA uptake and chromosomal integration in vitro (27). This finding suggested that the polysaccharide capsule may inhibit genetic exchange in encapsulated isolates either physically or structurally (28).
There are hence potentially antagonistic roles for serotype in recombination. A larger capsule is associated with a higher prevalence and longer duration of carriage, therefore offering more opportunities to undergo recombination, but conversely, experimental and observational data suggest that the absence of capsule increases transformation and recombination (26,27). As yet, there has been no attempt to quantify the association between serotype properties, such as capsule size and carriage duration, and the amount of recombination that those serotypes experience. Understanding this is important because as noted, recombination is crucial to the response of pneumococci to medical interventions (19), and if serotypes vary in recombination in a predictable fash-ion, we might be able to predict those that are a higher risk of developing drug resistance or vaccine escape.
In the present study, we used whole genomes of 439 pneumococcal isolates collected in Malawi, a low-income country in sub-Saharan Africa with high pneumococcal carriage rates (29) and which contains a wide range of serotypes, including serotypes with very low prevalence or that are absent in industrialized countries such as serotype 1 and 5 (18). We quantified the extent to which recombination has contributed to the history of serotype-specific lineages and combined this with data on carriage duration, invasive potential, carriage prevalence, and capsule size to quantitatively investigate the association. Our results demonstrate significant associations between the recombination rate and frequency with the carriage duration, invasive potential, and size of the outer cell wall polysaccharide capsule in pneumococcal isolates.

RESULTS
We sequenced the genomes of 439 Streptococcus pneumoniae clinical isolates collected from the Queen Elizabeth Central Hospital (QECH) in Blantyre, Malawi (southeastern Africa), between 2002 and 2010. The set of samples comprised 364 invasive isolates and 75 carriage isolates and representatives of 48 distinct serotypes. The most common serotypes were serotype 1 with 83 isolates (52 isolates from blood, 28 isolates from cerebral spinal fluid [CSF], and 3 isolates from carriers), serogroup 6 with 46 isolates (25 isolates from blood, 11 isolates from CSF, and 10 isolates from carriers), serotype 5 with 25 isolates (17 isolates from blood, 6 isolates from CSF, and 2 isolates from carriers), and other serotypes (285 isolates; 126 isolates from blood, 99 isolates from CSF, and 60 isolates from carriers) (see Table S1 in the supplemental material). Whole-genome sequencing was done using the HiSeq platform (Illumina, CA, USA) as described in Materials and Methods. The raw sequence reads had an average length of 72. 23 38.72 kb) per genome. The phylogeny of all the isolates was constructed using a 0.79-Mb multiple-sequence alignment of 852 concatenated core genes containing 51,389 single nucleotide polymorphisms (SNPs) (32). For the analysis of serotype-specific evolution of lineages identified as described below, SNPs were called against reference genomes (Table S2).
Population structure. Analysis of the population using BAPS (33, 34) identified 14 primary sequence clusters (SCs) (Fig. 1). Of these 14 SCs, SC1 to SC13 were monophyletic clades. The remaining SC, SC14, contained sequences that could not be assigned to a monophyletic SC, and the streptococci in this polyphyletic SC are a diverse group of rare lineages, the grouping of which does not necessarily reflect common evolutionary history, and hence, we undertake no further analysis of this group. Most SCs contained isolates with a single serotype and multiple closely related sequence types (STs) as determined by multilocus sequence typing (MLST) ( Fig. 1; see Table S1 in the supplemental material). In some cases, SCs contained multiple serotypes separated by long branches. A notable example is SC9, which consisted of clearly distinct subclades of serotype 6A and 35B (Fig. 1), which we split Chaguza et al. into two subclades 6A-SC9 and 35B-SC9 for the serotype-specific analyses. The subclade with serotype 35B in SC9 contained additional serotypes with the same ST that were extremely closely related at the whole-genome level (Fig. 1), which suggests that this subclade was a single lineage of serotype 35B that recently acquired other capsule types.
Variation in recombination among serotypes. Pneumococci are known to undergo homologous recombination, and all SCs containing capsule switch variants reflect a history of homologous recombination events. To determine the specific recombination events responsible for these changes, as well as those that could have occurred elsewhere in the genome, we used Gubbins (35),

FIG 1
Population structure of S. pneumoniae strains from carriers and patients with invasive disease. The phylogeny was constructed using a 0.79-Mb multiple-sequence alignment with 51,389 SNPs from 852 universally conserved (core) genes present in single copies. The 14 sequence clusters (SCs) identified by BAPS are labeled on the edge of the phylogeny. Clades corresponding to the 13 different monophyletic SCs (SC1 to SC13) are shown in different colors, while clades for all the strains in the polyphyletic clade (SC14) comprising of the "unclustered" sequences are not colored. The outermost ring around the phylogeny shows whether a serotype is a vaccine type (VT) included in the PCV13 pneumococcal vaccine formulation (black) or nonvaccine type (NVT) not included (white). Phylogeny was rooted using an outgroup method on a branch containing nontypeable (NT) pneumococci, which are genetically distinct from all pneumococcal strains. which identifies regions with an atypically high density of SNPs using a sliding window approach ( Fig. 2a and b; see Fig. S3a to S3o in the supplemental material). We distinguish between two separate recombination parameters. The relative recombination rate ( r/m ) is the ratio of the numbers of SNPs introduced by recombination to the number introduced by mutation. It hence measures the total amount of diversity accumulated by a lineage. In contrast, the recombination frequency ( re ) is defined as the average number of recombination events inferred per isolate, regardless of how much variation they have introduced. Hence, a large amount of recombination with closely related strains could lead to a high re but a low r/m . Furthermore, higher presence of shared ancestral recombination events acquired through clonal descent than strain-specific events introduced independently may lead to a high re . The averages of both these parameters were calculated for each SC. While SC1 and SC2 both contained isolates with a single serotype (serotype 1 and 5, respectively), in the case of SCs containing more than one serotype, the statistics for each were calculated independently (Table 1). Separate estimates were obtained for the two distinct 6A clades in SC3 and SC9, which we refer to as 6A-SC3 and 6A-SC9, respectively, on the basis of the predominant STs in each.
Estimates of r/m ranged from 0.44 to 23.45 with a mean of 8.89 (95% CI, 5.243 to 12.43) (Table 1), and the only estimate less than 1 was found in the 16F lineage SC5. The highest values of r/m were 12.53 for serotypes 23F-SC8, 23.45 for 6A-SC3, 12.85 for 6A-SC9, and 17.02 for 35B-SC9, while serotypes 1-SC2 ( r/m ϭ 2.61), 16F-SC5 ( r/m ϭ 0.44), 4-SC13 ( r/m ϭ 1.67), and 5-SC1 ( r/m ϭ 2.82) showed the lowest values ( Table 1). The distribution of r/m values deviated from a normal distribution (P Ͻ 0.0001) on the basis of three normality tests (see Materials and Methods). In serotypes of the same SC, such as 35B, 6A, and 18B/C in SC4, SC9, and SC13, respectively, the distribution of r/m ratios per branch were not different with P values of 0.7431 and 0.9350, respectively. The average estimated re was 6.316 (95% CI, 3.316 to 9.316) and ranged from 0.11 to 20.08. The highest re estimates were 12.26 for 23F-SC8, 20.08 for 35B-SC9, 8.91 for 6A-SC3, and 11.92 for 6A-SC9, while serotypes 1-SC2 ( re ϭ 0.73), 16F-SC5 ( re ϭ 0.11), 4-SC13 ( re ϭ 0.75), and 5-SC1 ( re ϭ 2) showed the lowest re values (Table 1). Significant differences (P ϭ 0.0005) in the average numbers of recombination events in each serotype were also observed by using the same test (see Fig. S2b in the supplemental material). The Kruskal-Wallis test was used because of significant deviations of the distributions of both r/m per branch in each serotype and the average number of recombination events for each serotype from the normal distribution. Interestingly, both distinct lineages with 6A capsule type consistently showed high r/m (23.45 in SC3 and 12.85 in SC9) and re (8.91 in SC3 and 11.92 in SC9) despite having a dissimilar genetic backbone. In addition, r/m positively correlated with re (R 2 ϭ 0.5168; P ϭ 0.0025) (Fig. 3). The pneumococcal pherotypes or competencestimulating peptides (CSP) encoded by the comC gene were diverse and variably distributed in the SCs but showed no obvious association with the recombination parameters (Table 1).
Serotype-specific carriage duration, prevalence, and invasive potential correlate with recombination. Different pneumococcal lineages vary in the extent of recombination that has influenced their genomes (20,26). Serotypes are known to vary in their du- The locations and distribution of regions, which have acquired exogenous DNA through recombination, are colored depending on the number of strains that contain them. Recombination events in internal branches (red) were present in multiple isolates and were shared through clonal descent rather than independent acquisitions, while those in the terminal branches (blue) were isolate specific and represent independent recent acquisitions. The recombination rate ( r/m ), i.e., mean number of the inferred distinct recombination events per isolate (each shared ancestral recombination event that occurred once and spread in the clone via clonal descent was counted once). The recombination frequency ( re ), i.e., the mean number of SNPs introduced through recombination to those introduced through mutation, is shown. A high presence of recent rather than shared recombination events implies high re . ration of carriage and propensity to cause invasive disease per exposure. A longer duration of carriage could offer more opportunities for cocolonization with different strains and possible recombination. We hence tested the association for each serotype between r/m and re and previously published serotype-specific estimates of carriage duration (36), carriage prevalence data from this study, capsule size (21), and invasive potential (defined as the odds ratio for causing invasive disease to carriage [IC OR ] [22,37]) using univariate linear regression. We observed a positive association between both r/m (R 2 ϭ 0.4664; P ϭ 0.0071) and re (R 2 ϭ 0.3393; P ϭ 0.0367) with carriage duration (Fig. 4a and Table 2; see Fig. S4a in the supplemental material), but only r/m showed a significant association (R 2 ϭ 0.4173; P ϭ 0.0093) with carriage prevalence of the serotypes ( Table 2; see Fig. S6a and S6b in the supplemental material). When modeling the relationship between carriage duration and re, serotype 35B-SC9 was excluded as an outlier, but its inclusion decreased the P value (R 2 ϭ 0.2406; P ϭ 0.0749). Both r/m (R 2 ϭ 0.4549; P ϭ 0.0228) and re (R 2 ϭ 0.7155; P ϭ 0.0010) showed negative association with the invasive potential of the studied serotypes ( Fig. 4b and Table 2; Fig. S4b and S4c). In multivariable regression, the association of carriage duration, prevalence, and invasive potential disappeared (Table 2). Repeated univariate and multivariate analyses using only those serotypes for which all response variables were available found similar results (Table S3). A multivariate regression to account for this removed the significant association with serotype invasive potential, while the association with carriage duration and capsule size remained (Table 2).
Recombination correlates with larger polysaccharide capsules. Capsule size has been associated with the factors described above-the serotype-specific duration of carriage and prevalence. The polysaccharide capsule size data for different pneumococcal serotypes, determined by the fluorescein isothiocyanate-labeled dextran (FITC-dextran) assay, were obtained from Weinberger and colleagues (21). By using univariate regression, we observed positive association between the capsule size and re (R 2 ϭ 0.5892; P ϭ 0.0095), but not r/m (R 2 ϭ 0.1186; P ϭ 0.300) (Fig. 5; Table 2). Excluding serotype 35-SC9, which was deemed to be an outlier observation, the linear association between capsule size and r/m increased (R 2 ϭ 0.3519; P ϭ 0.0544). This was robust to multivariate analysis (P ϭ 0.0296). We also did a multivariable regression to account for collinearity among capsule size, duration of carriage, carriage prevalence, and invasive potential and identify the independent contribution of each, which showed no association between r/m and re (Table 2; see Table S3 in the supplemental material).

DISCUSSION
Recombination is a fundamental process in the evolution and adaptation of many pathogens. It is capable of shuffling existing   Fig. 1 and Table 1. variation into more fit combinations, and in the case of pneumococcus, it has been an important force in its response to medical interventions like vaccines and antibiotics (19). Despite the obvious benefits to the pathogen that recombination offers in this regard, the rates with which it occurs vary greatly among pneumococcal lineages and serotypes (26) for reasons that are obscure. Understanding the factors that contribute to this variation is important as we attempt to limit the potential of pneumococcal evolution to erode the benefits of clinical innovation.
Previous studies have commented on the low presence of recombination in certain serotypes, such as the "highly invasive" serotypes 1 and 7F (20,38,39). A systematic comparison of capsular types is challenging because some serotypes are comparatively rare or absent from samples taken in industrialized countries, and while it might be possible to combine studies from different sites, it would be important to account for any bias arising from variation in recombination in different regions-for instance, previous work with MLST data found more evidence of recombination in samples from African carriers and speculated that this might be the consequence of a higher carriage rate (40).
We have used a set of pneumococcal samples that includes an unusually wide range of serotypes with diverse properties, and we found a clear association between serotype and r/m and re . Because of lack of data in resource-poor settings, we used previously published data on the properties of serotypes, such as capsule size, carriage duration, prevalence, and invasive capacity (all of which vary relatively little among sample sites); we found that all were associated with one recombination parameter or the other recombination parameter or both. In univariate analysis, a negative association with invasive capacity mirrored a positive association of r/m with carriage duration and prevalence-likely reflecting the fact that the majority of highly invasive serotypes are also infrequently carried. However, it should be noted that we did not have specific invasiveness data from pneumococci infecting the Malawi population, and there are few estimates for this quantity from resource-poor settings, so this is an important focus for future work.
We have also found more frequent recombination in serotypes with larger outer cell wall polysaccharide capsules, which shows that despite the potential physical barrier offered by the capsule,  Fig. 1 and Table 1. the presence of a larger capsule itself does not simply reduce recombination. While this could be considered surprising, it is important to note that serotypes with enlarged capsules are typically carried for longer duration and resist neutrophil phagocytosis (21) and complement-mediated immune responses (41,42) than serotypes with smaller capsules. In addition, this may be because some of the serotypes with larger capsules have lower metabolic costs (21) associated with capsule expression and may have a higher ability to form biofilms which may prevent elimination (27). This suggests that longer duration of carriage and lower clearance rates by the host's immune system in serotypes with enlarged capsules leads to sustained exposure to cocolonizers (potential donors of exogenous DNA), which has been previously described to facilitate recombination (40). This in turn results in higher recombination than in serotypes with thinner capsules, which are presented with limited opportunities for recombination due to rapid clearance. The intrinsic recombination rates in vitro in the absence of capsule-induced immune clearance may be different than observed in the natural population where various factors such as strain competition and immune responses can directly affect the recombination rate. Hence, the results of in vitro studies may be misleading when applied to the nasopharynx. In addition, there was no obvious association between the pneumococcal pherotypes such as CSP1, CSP2, and other variants encoded by the comC gene and its allelic variants, which suggests that the observed differences in recombination may not be due to the differences in competence. The evidence presented here certainly strongly suggests that capsule does not prevent recombination in any straightforward fashion. However, while capsule size was associated with a higher frequency of recombination ( re ), the association with r/m was not significant in the univariate regression. Nevertheless, the effect of capsule size and carriage duration in a multivariate analysis accounts for the high correlation between the two recombination parameters, i.e., r/m and re , indicating that these two factors are key contributors to the observed variation. The capsule data used in this study were obtained in vitro, and the expressed capsule sizes during carriage and transmission remain unclear and should be investigated in follow-up studies.
The properties of the capsule that we have studied here are also highly correlated (Pearson correlation coefficients are shown in Fig. S7 in the supplemental material), and accounting for this removed the association ( Table 2). As a result, we have not been able to precisely define the underlying causes. A plausible mechanism is that serotypes with larger capsules are more exposed to other cocolonizing strains because they are carried for a longer period and at a higher prevalence. One way to test this would be to use serotype switches as natural experiments and compare r/m and re between switched lineages where the two serotypes involved vary in their size. However, this is limited by the fact that serotype switches identified in Fig. 1 and other studies typically involve serotypes that are effective colonizers such as serogroups 6, 15, 18, 19, and 23 (15). While this means we cannot use the present data set to do the above analysis, we should note that this observation is consistent with the suggestion that more commonly carried types are more likely to undergo recombination with each other.
In conclusion, we have shown that in a set of samples from Malawi containing many different serotypes, there is a linear relationship between re and r/m and properties of the capsule. In general, there is evidence that recombination increases with carriage duration and polysaccharide capsule size. We have not been able to pick out the causal roots of this relationship due to the complexity of the relationships between other important properties also associated with capsule, but the observation shows conclusively that all serotypes are not alike and that serotypes differ in how recombination has affected their evolution history, which has been associated with many different responses from drug resistance to vaccine escape. The serotypes observed in settings with a high burden of carriage and disease, such as sub-Saharan Africa, are often different from those found in industrialized nations, and there has been concern that lineages with serotypes targeted by vaccine such as serotypes 1, 5, and 7F that are relatively invasive, with a short duration of carriage and small capsules, might survive a serotype switch (21,22,36). The results of this study suggest that such concerns may be misplaced.  Table S1 in the supplemental material). Isolates were cultured in Todd-Hewitt broth, and genomic DNA was extracted using the Promega Wizard DNA genomic DNA purification kit (Promega, USA). DNA sequencing was done using the Illumina genome analyzer II (Illumina, CA, USA) platform at the Wellcome Trust Sanger Institute. A robust bespoke pipeline (43) Fig. 1 and Table 1.

Isolate
used to generate sequence assemblies from the paired-end Illumina sequence reads for the study isolates. Serotypes were identified from the whole-genome sequence data using an in silico approach implemented with Perl and BioPerl (46) as described by Croucher et al. (47). Population structure and phylogenetic analysis. Phylogenetic trees were constructed using a 0.79-Mb sequence alignment of 51,389 single nucleotide polymorphisms (SNPs) in 852 concatenated core genes present in single copies (no paralogs). The conserved (core) and nonconserved (accessory) genes were identified using the Roary bacterial genome analysis pipeline (32). The population structure of the isolates was inferred using the hierBAPS module implemented in BAPS v6.0 (33,34). Maximum likelihood phylogenies of the concatenated core gene alignments for all the study isolates and serotype-specific isolates in the inferred sequence clusters (SCs) from BAPS were constructed using RAxML v7.0.4 (48). We used a generalized time reversible (GTR) (49) model with gamma (⌼) heterogeneity across nucleotide sites and 100 bootstrap replicates. The tree was rooted with an outgroup clade of nontypeable (NT) isolates, which form a distinct clade from other pneumococci (50). Phylogenetic trees and associated metadata, namely, sequence types (STs), serotypes, and inclusion of the sample's serotype in the PCV13 vaccine formulation, were overlaid on the phylogenies and visualized in iToL (51) and BioPython scripts (52).
Recombination detection and phylogeny construction. Phylogenies for specific serotypes were generated for downstream phylogenetic analysis through mapping paired-end short sequence reads against different published reference whole-genome sequences (see Table S2 in the supplemental material) for different pneumococcal serotypes using SMALT v0.7.4. The output binary alignment map (BAM) files were realigned using the Genome Analysis Tool Kit (GATK) v3.3.0 (53). The locations of recombination events in each chromosome in the alignments were detected using Gubbins v1.1.1 (35). The serotype-specific phylogenies with recombination removed were constructed using RAxML v7.0.4 (48) as above and rooted in the middle of the branch, separating the two most divergent isolates. Visualization of the phylogenies was done as described in "Population structure and phylogenetic analysis" above.
Carriage duration, serotype invasive potential, and capsule size. We used previously published data from Kilifi in Kenya, data on carriage duration of pneumococcal serotypes in the human nasopharynx from Abdullahi and colleagues (36) and data on serotype invasive potential (odds ratio for IPD to carriage [IC OR ]) from Brueggemann and colleagues (22). Additional data for serotype 5 invasive potential were obtained from Smith and colleagues (37). The polysaccharide capsule sizes for the serotypes were obtained from Weinberger and colleagues (21) where the degree of encapsulation was determined by measuring the zone of exclusion using the fluorescein isothiocyanate-labeled dextran (FITC-dextran) assay. The specific methods used to obtain the data sets are described in the respective publications. We excluded serotypes with estimated IC OR values of 0 (22) to prevent overfitting as a consequence of the smaller number of isolates used from invasive disease sample collection. Genomic data on the relative recombination rate and frequency of pneumococcal lineages and prevalence of serotypes in carriage were obtained from Malawi (this study).
Statistical analysis. We used univariate, multivariable, and multivariate multiple regression to examine the association between known properties of serotypes and the relative recombination rate ( r/m ) and frequency ( re ). The univariate regression model were separately formulated as Y i ϭ ␤ 0 ϩ ␤ 1 x i ϩ i for each dependent variable Y i ( r/m and re ) against each independent variable x i where i designates the polysaccharide capsule size, carriage duration, serotype invasive potential, and carriage prevalence. The intercept and regression coefficient for the independent variables are represented by ␤ 0 and ␤ 1 , respectively, while i represents the residuals. We formulated the multivariable model similarly but included multiple independent variables to account for collinearity between them as Y i ϭ ␤ 0 ϩ ␤ 1 x i ϩ ␤ 2 x i ϩ ␤ 3 x i ϩ ␤ 4 x i ϩ i , where ␤ 0 represents the intercept and the terms ␤ 1 , ␤ 2, ␤ 3, and ␤ 4 are the regression coefficients for each independent variable (x i ) where i is polysaccharide capsule size, carriage duration, serotype invasive potential and carriage prevalence. We also modeled the relationship between both dependent variables ( r/m and re ) and the independent variables in the multivariable regression this time with a multivariate multipl e regression model. The multivariate model was formulated as Y nxm ϭ X nx(pϩ1) ␤ (pϩ1)xm ϩ nxm , where n designates each serotype and lineage combination as follows 1-SC2, 5-SC1, 6A-SC3, . . ., and 4-SC13 (Table 1). The m in the matrix Y nxm takes values 1 and 2 that designate dependent variables r/m and re , respectively. The model matrix is the term X nx(pϩ1) and consists of pϩ1 regressors where p represents polysaccharide capsule size, carriage duration, serotype invasive potential, and carriage prevalence of serotypes. The first column in the model matrix X nx(pϩ1) consists of 1s, which represents the regression constants. The matrix of regression coefficients for all the independent variables in X nx(pϩ1) is represented by parameter ␤. The residual term in the model is represented by the matrix nxm . For comparison with the multivariable and multivariate analyses, we repeated the univariate analyses with the same sample size.
We calculated the pairwise Pearson correlation coefficients for independent and response variables and plotted them using the ggplot2 package. The regression analysis, normality tests, namely, D'Agostino and Pearson, Shapiro-Wilk, and Kolmogorov-Smirnov tests, Kruskal-Wallis and analysis of variance (ANOVA) tests and summary statistics, such as means and ranges, were calculated using R v3.1.2 (R Core Team, 2014) and GraphPad Prism v6.0 (GraphPad Software, Inc., CA, USA).