Chromosome End Repair and Genome Stability in Plasmodium falciparum

ABSTRACT The human malaria parasite Plasmodium falciparum replicates within circulating red blood cells, where it is subjected to conditions that frequently cause DNA damage. The repair of DNA double-stranded breaks (DSBs) is thought to rely almost exclusively on homologous recombination (HR), due to a lack of efficient nonhomologous end joining. However, given that the parasite is haploid during this stage of its life cycle, the mechanisms involved in maintaining genome stability are poorly understood. Of particular interest are the subtelomeric regions of the chromosomes, which contain the majority of the multicopy variant antigen-encoding genes responsible for virulence and disease severity. Here, we show that parasites utilize a competitive balance between de novo telomere addition, also called “telomere healing,” and HR to stabilize chromosome ends. Products of both repair pathways were observed in response to DSBs that occurred spontaneously during routine in vitro culture or resulted from experimentally induced DSBs, demonstrating that both pathways are active in repairing DSBs within subtelomeric regions and that the pathway utilized was determined by the DNA sequences immediately surrounding the break. In combination, these two repair pathways enable parasites to efficiently maintain chromosome stability while also contributing to the generation of genetic diversity.

In addition to HR and NHEJ, eukaryotic organisms can also stabilize DSBs occurring at the chromosome ends through the action of telomerase, the enzyme that maintains the telomeric repeats at this specialized chromosomal location. This repair pathway, called de novo telomere addition, or "telomere healing," involves the recruitment of the telomerase complex directly to the site of the break (20,21). The DNA strand is resected until a region of high TG content is encountered, which is thought to "seed" telomerase and enable it to incorporate telomeric repeats at the site of a DNA break and resection (Fig. 1B). This results in the creation of a functional telomere, thereby stabilizing the chromosome and maintaining genome integrity, albeit with a deletion of the region of the chromosome between the DSB and the original telomere and loss of the intervening genetic information. Telomere healing is thought to compete with HR and NHEJ for repair of DSBs, with the pathway of repair depending on the chromosomal environment in which the break occurs (20). Telomere healing has been described for a number of eukaryotic organisms, and the addition of telomeric repeats to the ends of broken chromosomes has been described for P. falciparum (22)(23)(24), indicating that this mechanism of repair is conserved. However, the details of telomere healing in malaria Step 1: typical chromosomes can be divided into a core genome containing housekeeping genes, telomere repeats at the extreme end of the chromosome, and intervening subtelomeric regions. A DSB within the subtelomeric region can be repaired by telomere healing.
Step 2: the DSB is recognized by protein complexes that include exonuclease activity. In model organisms, both Exo1 and DNA2 have been implicated in the resection of DNA away from the telomere, revealing a single-stranded 3= end.
Step 3: when a single-strand sequence is revealed that can anneal to the template RNA of the telomerase complex, telomerase activity extends from the break, placing telomere repeats directly at this site.
Step 4: repeated rounds of telomere addition result in a stable telomere and maintain genome integrity.
Chromosome End Repair in Plasmodium falciparum ® parasites and how this pathway integrates with HR and alternative NHEJ have not been studied extensively. Given the general chromosome structure of P. falciparum (Fig. 1A), telomere healing could repair DSBs that occur anywhere within the extensive subtelomeric regions between the chromosome ends and the internal, highly conserved regions of the genome. Genes within these subtelomeric regions are necessary for host-pathogen interactions but are not required for viability in cultured parasites; thus, deletions of subtelomeric regions are tolerated and can be efficiently recovered and analyzed. The semirepetitive nature of these regions suggests that DSBs could also potentially be repaired by HR, allowing the study of both pathways as they repair breaks within this chromosomal environment. Detailed sequence examination of subtelomeric domains has been difficult in the past due to the repetitive nature of the sequences, making assemblies of these regions from short sequence reads problematic. However, newer technologies, including those that utilize single-molecule real-time (SMRT) sequencing, enable the confident assembly of complete subtelomeric regions, allowing us to examine in detail DSB repair in these regions of the genome. We applied this technology to study the repair of spontaneous DSBs that occur within subtelomeric regions during in vitro culture as well as breaks induced randomly through exposure to ionizing radiation.
Our analysis indicated that both telomere healing and gene conversion through HR can repair DSBs within subtelomeric domains. Telomere healing was the most common type of repair we observed, and which pathway was utilized was strictly determined by the sequence surrounding the break point. Considering the repetitive nature of these chromosomal regions, subtelomeric deletions resulting from telomere healing events could be followed by subsequent gene conversion through HR, thus reestablishing the typical subtelomeric structure and maintaining the overall parasite's chromosome organization. We conclude that both repair pathways therefore work in tandem to maintain genome integrity and preserve the complement of clonally variant genes found within the genomes of P. falciparum isolates.

RESULTS
Extensive telomere healing is observed in cultured parasites. While telomere healing has been described for P. falciparum (22)(23)(24), how commonly it occurs within the 28 subtelomeric domains of the parasite's genome has not been closely examined. Telomere healing has been most extensively studied in Saccharomyces cerevisiae, in which telomerase displays a strong preference for TG-rich sequences to initiate synthesis of repeats (25,26). If P. falciparum telomerase has a similar preference for TG-rich sequences to initiate synthesis of repeats, then one is likely to observe telomeric repeats fused to the coding regions of truncated genes at the new chromosome ends, as coding regions are significantly enriched in Cs and Gs compared to noncoding regions of the genome.
As a first step to determine if telomere healing is common in cultured P. falciparum and to obtain a baseline for our experimental analysis, we analyzed the sequence of all 28 chromosome ends provided in the reference genome sequence of 3D7 (Plasmodb.org). Five chromosome ends display the hallmarks of telomere healing, including the absence of TAREs and telomeric repeat sequences fused directly into or next to protein coding regions. In four of these, on chromosomes 5 and 6 and both ends of chromosome 14, the telomeric repeats are found within the coding region of a truncated var gene, creating a pseudogene. In the fifth example on chromosome 11, the telomeric repeats are found immediately downstream of an intact var gene ( Fig. 2A).
Next, we investigated if telomere healing events occurred in parasites cultivated in the laboratory; we generated several subclones from a stock of 3D7 parasites that had been grown in culture for several years in our laboratory and chose one for wholegenome sequencing. To determine the sequence of the chromosome ends and to avoid difficulties in assembling the repetitive structure of subtelomeric domains from the short sequence reads typically derived from most methods of high-throughput sequencing, we instead utilized SMRT sequencing, which can yield single contiguous reads of 10 to 20 kb (27,28). Using this method, we verified the five telomere healing events recognized in the reference genome sequence and also identified two additional novel DNA sequences consistent with telomere healing (Fig. 2B;   The second occurred on chromosome 3 and similarly deleted all of the subtelomeric domain (~85 kb), inserting telomeric repeats just downstream of a gene encoding a putative alpha/beta-hydrolase (PF3D7_0301300). These two events provided us with a "before and after" picture of telomere healing and further indicated that this is a common mechanism of DSB repair within the subtelomeric regions of P. falciparum. Inducement of DSBs and repair in cultured parasites by X-ray irradiation. To more directly observe DSB repair within subtelomeric domains, we chose to induce random DSBs by exposing parasite cultures to X-ray. X-rays are known to cause DSBs without sequence bias, and such breaks must be repaired for parasite viability. Thus, by selecting for viable parasites after near-lethal exposure to X-rays, we hoped to detect and analyze additional examples of DSBs within subtelomeric regions. The subclone of 3D7 previously used for SMRT sequencing was exposed to increasing amounts of X-ray radiation to determine the level of exposure that would lead to significant widespread DNA damage yet allow for parasites to repair, recover, and propagate. Exposure to 100 Gy resulted in significant lethality; however, viable parasites grew from irradiated cultures within 10 to 12 days after exposure. Parasites were exposed to 100 Gy three times consecutively, allowing the parasites to recover normal growth after each irradiation. Subclones were then isolated by limiting dilution, and one clone was subjected to whole-genome sequencing using the SMRT methodology. Analysis of the genome sequence identified two additional examples of telomere healing, one on chromosome 1 that led to a deletion of~90 kb and resulted in the insertion of telomeric repeats into a rifin gene (Pf3D7_0101900) and a second on chromosome 2 that led to deletion of 100 kb and resulted in the insertion of telomeric repeats just downstream of the hypothetical gene Pf3D7_0221000 ( Fig. 3A; Fig. S2, S5, and S6). This clone of 3D7 now carries significant truncations within 9 of its 28 subtelomeric regions.
In addition to the two new examples of telomere healing identified for this clone, we also detected three examples of recombination events that likely resulted from HR. One of the subtelomeric regions of chromosome 13 is a hybrid sequence in which the original subtelomeric region has been deleted and replaced by~25,500 bp, including the telomeric repeats, from one of the subtelomeric regions of chromosome 9. The breakpoint of the recombination event occurred within the coding region of a var gene, creating a new var gene that is a chimera of PF3D7_0900100 and PF3D7_1300100 (Fig. 3B). A more complex product of recombination was identified near one end of chromosome 12. The first~8,100 bp, including the telomeric repeats, was derived from one of the subtelomeric domains of chromosome 9 and, given that both ends of chromosome 9 remain unchanged, this appears to be the result of a gene conversion event. This fragment is fused to 13,886 bp of sequence identical to a region within one of the subtelomeric domains of chromosome 1. This sequence is within the portion of chromosome 1 that is now deleted in this clone, indicating that it was transposed into the subtelomeric region of chromosome 12 prior to its deletion from chromosome 1 (Fig. 3B). This could have resulted from either gene conversion or reciprocal recombination. These data indicate that DSBs that occur within subtelomeric domains can be repaired either by HR or telomeric healing and that the repair process can generate new var genes.
Sequence preference for insertion of telomeric repeats by telomerase. A model for telomere healing in higher eukaryotic cells has been derived from extensive experiments conducted primarily in yeast. When a DSB forms within a subtelomeric region, the DNA is initially resected by one of two exonucleolytic pathways (Exo1 or Dna2/Sgs1), revealing a region of 3= single-stranded DNA (29). The resection continues until a sequence is encountered that can anneal to the template region of telomerase RNA. This allows telomerase to begin synthesizing the telomeric repeats directly at the end of the chromosome, continuing for multiple rounds of DNA synthesis and resulting in a functional telomere that can stabilize the chromosome end (Fig. 4A). As predicted by this model, telomere healing is initiated at sites that display sequences complementary to the template region of telomerase RNA. In yeast, these sites almost always include GT, TG, or CG dinucleotides at the site where telomerase initiates synthesis of telomere repeats (25,26). In addition, the flanking region also plays a role in determining where the new telomere is synthesized, presumably by influencing how efficiently telomerase is recruited to the chromosome end (30). In S. cerevisiae, stretches of TG repeats between 22 and 250 bp effectively recruit telomerase for healing, but stretches either longer or shorter are repaired much less efficiently (31)(32)(33)(34). Proximal enhancer sequences that can function as binding sites for proteins that associate with the telomerase complex can also increase the efficiency of telomere healing (30). Sites of telomere healing in P. falciparum were previously identified within genes that are known to be commonly disrupted in cultured parasites, presumably because such deletion events provide a growth advantage in vitro (22,24); the ability of P. falciparum telomerase obtained from nuclear extracts to extend from specific sequences in vitro was also investigated (23). The identification of 9 independent telomere healing events in our sequence data, including events that we induced with radiation exposure, allowed us to investigate in more detail the properties at the site of a DSB that contribute to telomere healing in P. falciparum and whether they differ from sites that are instead repaired by HR.
The sequence of the telomerase RNA template region for several Plasmodium species was predicted bioinformatically by Chakrabarti  Chr. 13   quence is known to be species specific and, unlike the equivalent sequence in yeast, the P. falciparum sequence does not consist exclusively of As and Cs and instead includes two additional bases (UG) within the sequence (5=-ACCCUGAACCC-3=). Interestingly, given that the two major telomeric repeat sequences are 5=-TTCAGGG-3= and 5=-TTTA GGG-3=, the G within the template sequence appears to specify either C or T. The sites where telomeric repeats were added to the chromosome ends for all nine telomere healing events identified in our sequence datasets are shown in Fig. 4B. As can be easily discerned, the sites where repair was initiated in all nine sequences displayed the ability to anneal to the telomerase template region at precisely the same position; thus, in all cases the first bases added to the new telomere were GGGTT. These data indicate that telomere healing in P. falciparum likely involves the same mechanism described for yeast, yet the sequence of the telomerase RNA template explains why the site of telomere repeat addition diverges from the GT, TG, or CG preference described for S. cerevisiae. Examination of the sequences upstream of the newly added telomere repeats did not identify any discernible motifs or compositional bias, indicating either that proximal enhancers like those identified in yeast do not exist in P. falciparum or that recruitment of the telomerase complex is not influenced by the primary sequence.
Choice of repair pathway: HR versus telomere healing. In most extensively studied organisms, DSB repair in most regions of the genome results from competition between the two primary repair pathways, HR and NHEJ. Within subtelomeric regions of the genome, telomere healing can serve as a third potential repair pathway. The pathway that is ultimately chosen depends on multiple factors and is often species specific (20,21). In yeast, the likelihood of telomere healing occurring at a specific break appears to depend on how efficiently the telomerase complex is recruited to the site of the break, which in turn depends on the DNA sequence immediately upstream of where the repair event occurs (20). Telomerase can then initiate the addition of telomeric repeats when 3 to 6 bp of DNA anneals to the telomerase RNA template (Fig. 4) (25). Telomerase has been shown to be recruited by either Cdc13 or the Ku70/80 complex (26,(36)(37)(38), and a proximal Cdc13 binding sequence was shown to greatly increase the likelihood of a healing event at a particular chromosomal position (30). Malaria parasites lack the above-mentioned DNA repair proteins; thus, what determines whether HR or telomere healing occurs at the site of a subtelomeric break is not known.
Previous work has shown that efficient HR within a nonsubtelomeric region of the genome requires near-complete sequence identity between the sequence surrounding the break and the template used for repair (15). If this property also applies to HR within subtelomeric regions, the use of HR might be limited only to breaks that occur within stretches of sequence that are duplicated with near-complete sequence identity elsewhere in the genome. To investigate this possibility, we identified the sequences immediately surrounding the break points of the three products of HR that we obtained from our irradiated clone and performed BLAST searches to determine if these regions are duplicated elsewhere in the genome. Indeed, in all three cases there were regions of perfect sequence identity surrounding the site of the recombination event. These stretches of 100% identity extended for 296, 127, and 27 bp within the sequences of the recombining chromosomes (Fig. 5). In contrast, BLAST searches using stretches of sequence at the nine sites of telomere healing indicated that these regions are unique within the genome, with no identifiable stretches of sequence identity at the site of telomere addition. These data are consistent with a model in which HR is the dominant pathway of DSB repair throughout the genome, including within the subtelomeric regions. However, the strict requirement for extensive sequence identity prevents this pathway from repairing breaks that occur within unique sequences, thus allowing telomere healing to occur when sequences similar to the telomerase template RNA are exposed during resection of the DNA strand.

DISCUSSION
Telomere healing appears to be a conserved mechanism for stabilizing chromosomes in the event of a DSB that occurs within a subtelomeric domain. The basic Chromosome End Repair in Plasmodium falciparum ® machinery involved in telomere healing is likely shared between malaria parasites and model eukaryotes. In yeast, either the Exo1 or DNA2/Sgs1 pathways mediate DNA resection from the site of the break. Orthologues to both of these enzymes appear to be encoded within the P. falciparum genome (PF3D7_0725000 and PF3D7_1010200, respectively). In addition, telomerase reverse transcriptase (PfTERT) has also been identified and displays the expected motifs, though with many insertions of stretches of basic amino acid sequences, as is often seen in Plasmodium proteins (39). However, our work revealed some unique aspects to telomere healing in P. falciparum. The strong preference for stretches of TG repeats at the site of telomere addition that is observed in yeast was not seen in P. falciparum, nor was any proximal enhancer sequence detected. The different sequence preference likely results from differences in the template region of telomerase RNA, which in yeast consists exclusively of AC base pairs, while in P. falciparum this sequence also includes a GU dinucleotide. The apparent lack of any preference in the sequence immediately flanking the site of repair might indicate more significant evolutionary divergence in how telomerase is recruited. Organismspecific proteins at telomeres have been characterized in model organisms and in African trypanosomes, and they indicate there can be significant evolutionary divergence (40). Proteins known to play important roles in telomerase function and telomere stabilization in other organisms, such as TRF, Cdc13, Rad 52, and POT1, could not be identified in the P. falciparum genome using standard bioinformatics approaches, and thus the recruitment, retention, and function of telomerase in P. falciparum is likely to  have unique elements, such as the recently characterized protein PfTRZ. This protein was found to be a functional homologue to the transcription factor TFIIIA, yet it is associated with parasite telomeres and has a role in telomere maintenance (41).
The subtelomeric domains of the chromosomes of P. falciparum are of significant research interest due to the large multicopy gene families that reside within these regions. The unique structure of the chromatin found here has been shown to play a role in regulating clonally variant expression, thereby facilitating the process of antigenic variation and immune system avoidance (42). The positions of these large, semiredundant gene families within the subtelomeric domains of most or all of the chromosomes mean that these genomic regions share significant blocks of sequence identity. In addition, the clustering of these regions at the nuclear periphery makes them prime substrates for HR in the event of a DSB (11). This unique genomic organization, combined with the absence of canonical NHEJ, provides a simple mechanism that drives the generation of chimeric genes and thus the vast diversity of these gene families that is observed in the field. Indeed, a chimeric var gene was readily generated over the course of the experiments described here.
While repair of DSBs within subtelomeric domains by HR is the likely source of diversity within the multicopy gene families, HR appears to not be efficient when a DSB occurs within a sequence that diverges more than~2% in identity from any possible template for repair (15). Indeed, the three examples of HR identified in this study all displayed long stretches of complete sequence identity between the two chromosomal regions involved in the recombination events (Fig. 5). This finding has been confirmed by other studies that observed similar, albeit somewhat shorter, regions of sequence identity at sites of recombination within var genes (43)(44)(45). Given the extensive sequence diversity within var, rifin, stevor, and Pfmc-2TM genes, the chance that a randomly occurring DSB will occur precisely at a position with sufficient sequence identity to another position in the genome to serve as a template for HR is low. In the absence of efficient NHEJ, such DSBs would generally be lethal. However, telomere healing provides an alternative pathway for stabilizing DSBs that occur within subtelomeric domains, thus enabling parasites to survive DNA damage within these regions and maintain genome integrity. Chromosomes that have undergone telomere healing could later undergo HR when a subsequent DSB occurs within a sequence that shares identity with a region of a full-length subtelomeric domain. The resulting gene conversion event would reestablish typical chromosomal structure, including a full complement of clonally variant gene copies as well as TAREs (Fig. 6). When such events occur within the coding regions of the variant antigen-encoding genes, new genes are created, as demonstrated by the HR event that occurred within the subtelomeric region of chromosome 13 described here (Fig. 3B) and as has been observed in other studies (43)(44)(45). Of note, these recombination events retain reading frame and general gene structure. Evidence for telomere loss and potential healing events in field isolates indicates that this mechanism of chromosome stabilization occurs in naturally circulating parasites (46)(47)(48). This provides a model for mitotic diversification of these important gene families using both telomere healing to stabilize chromosome ends and HR when breaks occur within areas of sequence identity, thereby creating new chimeric genes and restoring complete subtelomeric regions. Telomere healing therefore provides a complementary method to HR for preserving the structure of chromosome ends. Together, these two pathways of DSB repair function to maintain genome integrity and chromosome stability in the absence of robust NHEJ and also drive the generation of diversity within the clonally variant multicopy gene families of P. falciparum.
Chromosome End Repair in Plasmodium falciparum ® Parasite irradiation. Irradiation was administered using a Rad Source 2000 irradiator set at 160 kV/ 25 mA. To generate random DSBs, 3D7 parasites seeded at 0.5% were exposed to 100-Gy X-ray irradiation three times consecutively, being allowed to recover to normal growth between each irradiation exposure. The degree of subtelomeric damage was preliminarily assessed after each round of irradiation by assaying for var gene deletion by quantitative PCR using genomic DNA (gDNA) as the template and the var-specific PCR primer set described previously by Salanti and colleagues (50). The subclone chosen for whole-genome sequencing displayed loss of three subtelomeric var clusters in this assay.
Genomic DNA isolation. One hundred milliliters of cultured parasites at 5 to 8% parasitemia was harvested for isolation of gDNA. DNA was isolated and purified using phenol-chloroform extraction followed by ethanol precipitation, as previously described (51).
SMRTbell library preparation. Sequencing libraries were produced using the PacBio 20-kb library preparation protocol for high-molecular-weight gDNA obtained from clonal parasite lines, as previously described (28). We used the SMRTbell template prep kit 1.0 (Pacific Biosciences) following the standard 20-kb template preparation using the BluePippin size Selection system protocol (Pacific Biosciences). Briefly, parasite DNA was sheared twice for 1 min at 5,300 rpm in an Eppendorf 5424 centrifuge using a g-Tube (Covaris) followed by damage repair, end repair, and ligation of SMARTbell adapters. Unligated DNA was digested with exonucleases, and the libraries were size selected using a BluePippin pulsed-field gel electrophoresis instrument (Sage Science) to isolate fragments greater than 15 kb. Library concentration was measured with the Qubit fluorometer dsDNA BR assay kit (Life Technologies, Inc.), and fragment length distributions were generated using the 2200 TapeStation (Agilent). Sequencing primer and P6 polymerase were annealed to the libraries according to the manufacturer's protocols (Pacific Biosciences) and performed with P6-C5 chemistry and v3 SMRT cells on an RSII instrument at Weill Cornell Medicine.
Genome sequencing and analysis. Pacific Biosciences RSII-based single-molecule sequencing was used to prepare long-read datasets suitable for accurate assembly of the subtelomeric regions of the parasite's genome. Six RSII SMRTCells were used for the nonirradiated clone library, while the irradiated clone library was sequenced with eight SMRTCells. Filtered sequence data were assembled using HGAP 2.0 with Quiver polishing. This Celera Assembler-based de novo assembly approach produced genome sizes of 23.  The occurrence of a DSB at a site of unique sequence within a subtelomeric region is stabilized by telomere healing. This results in a substantial deletion of the subtelomeric domain, including members of multicopy gene families and TAREs. (B) A subsequent DSB within a region that shares sequence identity with subtelomeric regions from other chromosomes can be repaired by HR, leading to reestablishment of the normal subtelomeric structure, including a full complement of multicopy genes and TAREs. Repair by HR can also result in chimeric genes, thereby contributing to the generation of diversity within the multicopy gene families. coverage for each assembly was 130ϫ and 100ϫ for the nonirradiated and irradiated lines, respectively, and the average consensus concordance for both assemblies was greater than 0.9975; therefore, only 2.5 bases in 1,000 would be assigned incorrectly.
In order to identify recombination break points, telomere additions, or telomere deletions, each assembly was first aligned to a 3D7 genomic sequence from PlasmoDB using Mauve (28). From the Mauve alignment, chromosomes were assigned to each polished contig in both assemblies. Thirty kilobases from the right and left ends of each chromosome from the irradiated and nonirradiated polished assemblies were then aligned to the PlasmoDB reference genome by using BLAST (52). Recombination break points and telomere additions were determined via manual inspection of the tabulated BLAST output. Each break point was also confirmed using the NCBI Blastn online tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
To validate structural variants (SV) seen in comparisons between the PlasmoDB 3D7 reference and our assemblies, reads from the datasets used for assemblies of irradiated and nonirradiated clones were aligned to the Plasmo DB reference using BWA-MEM (version 0.7.13-r1126; arXiv:1303.3997) with default parameters. The resulting alignments were processed with SAMtools (version 2.6.32-279.el6.x86_64) (53) and visualized with IGV (version 2.3.92) for manual inspection of loci with predicted SVs. Analyses of the sequences upstream of the loci at which telomere repeats were added were performed using the MEME software suite and did not identify discernible motifs shared between any of the sequences.