CRISPR Interference of a Clonally Variant GC-Rich Noncoding RNA Family Leads to General Repression of var Genes in Plasmodium falciparum

Plasmodium falciparum is the deadliest malaria parasite species, accounting for the vast majority of disease cases and deaths. The virulence of this parasite is reliant upon the mutually exclusive expression of cytoadherence proteins encoded by the 60-member var gene family. Antigenic variation of this multigene family serves as an immune evasion mechanism, ultimately leading to chronic infection and pathogenesis. Understanding the regulation mechanism of antigenic variation is key to developing new therapeutic and control strategies. Our study uncovers a novel layer in the epigenetic regulation of transcription of this family of virulence genes by means of a multigene-targeting CRISPR interference approach.

central clusters of chromosomes 4, 6, 7, 8, and 12 (8), and their transcription is epigenetically controlled (4). Transcription of a single var gene peaks at about 12 h post invasion (hpi) (9) and is then silenced but poised during the later stages of the 48-h intraerythrocytic cycle (10,11). All other var genes remain transcriptionally silenced throughout the cycle and are tethered in repressive heterochromatic clusters enriched in H3K9me3 and H3K36me3 at the nuclear periphery (10,12,13). In contrast, the active var gene is euchromatic, enriched in H3K4me3 and H3K9ac, and localizes to a distinct perinuclear expression site (10,12,14) (a schematic is shown in Fig. 1A). Since the first description of this expression site (14), the mechanism of activation has remained elusive. Long noncoding RNAs (lncRNA) transcribed from the var intron have been implicated in a cis activation process (15,16); however, the necessity of the intronderived lncRNA was questioned by a recent study showing that an intron-less var gene could be activated and silenced (17).
The initial characterization of a gene family encoding 15 GC-rich noncoding RNAs (annotated RUF6) located adjacent to central var genes suggested a role in the regulation of var genes (18,19). Fluorescence in situ hybridization (FISH) revealed that these ncRNAs colocalized to the var gene expression site and that episomal overexpression of distinct GC-rich ncRNAs resulted in the deregulation of mutually exclusive var gene expression (18). However, their mechanism of action remains unknown. In this study, we used CRISPR interference (CRISPRi) to downregulate the entire GC-rich gene family and provide evidence for the necessity of the GC-rich ncRNA in mutually exclusive var gene activation. Our results demonstrate a clear link between the transcription of both gene families, along with other clonally variant gene families involved in malaria parasite virulence.

RESULTS
GC-rich ncRNA shows predominant transcription of a single member when adjacent to an active var gene. Recently, RNA FISH was used to demonstrate the physical association of GC-rich ncRNAs with the expression site of central and subtelomeric var genes (18). Given the restricted genomic location of the GC-rich genes adjacent to some, but not all, central var genes (Fig. 1A), we investigated whether the transcription of GC-rich and var multigene families is coordinated. To this end, we generated an array of P. falciparum 3D7 wild-type (WT) clones and performed pairedend RNA sequencing (RNA-seq) analysis at 12 hpi to determine the transcriptional profile of the highly homologous GC-rich ncRNA and var gene families. In three clones, a single member of the 15 GC-rich genes was predominantly transcribed ( Fig. 1B; see also Fig. S1A in the supplemental material). In the same clones, we observed mutually exclusive transcription of the central var gene adjacent to the upstream region of the active GC-rich gene ( Fig. 1B and Fig. S1A). Notably, when the active central var gene lacks an adjacent GC-rich gene, ncRNA transcripts from several loci were detected, but at levels much lower than those in the former situation (Fig. 1C). Since subtelomeric var genes are prone to switch faster in culture, we were able to isolate only a clone with a dominant subtelomeric var gene that also expressed a second central var gene. In this clone, we observed low-level transcription from several GC-rich ncRNA genes, in addition to the dominant GC-rich ncRNA adjacent to the transcribed central var (Fig. S1B). Additionally, we performed receptor panning with chondroitin sulfate A (CSA) to enrich for parasites expressing a single subtelomeric member called var2csa. In this parasite panned line, we observed predominant transcription of var2csa and low levels of transcripts from several GC-rich genes (Fig. S1B). Taken together, these data suggest that the GC-rich gene family is expressed in a clonally variant manner related to the chromosomal location of the active var gene. GC-rich genes are predominantly transcribed from a single locus when adjacent to and upstream of an active central var gene.
CRISPRi of the entire GC-rich gene family leads to transcriptional downregulation of the var gene family. To determine the role of GC-rich ncRNA in var gene expression regulation, we aimed to downregulate the transcription of all GC-rich ncRNA   genes. An attempted simultaneous knockout of all 15 highly homologous members was unsuccessful, likely due to the widespread distribution of the GC-rich genes or the diversity of their up-and downstream regions required for such an approach. Thus, we developed a CRISPR interference (CRISPRi) (20) strategy for multigene families to block transcription via binding of dead Cas9 (dCas9), a mutated Cas9 protein lacking endonuclease activity. We designed a single guide RNA (sgRNA) targeting a homologous region common to all GC-rich gene members ( Fig. 2A). We transfected the 3D7 G7 parasite strain (a WT clone expressing the central var gene PF3D7_0412700) with the pUF-dCas9-GFP-3HA plasmid and either pL8-gRNA-GC-tc or a control plasmid with a scrambled guide RNA (pL8-gRNA-control). Chromatin immunoprecipitation (ChIP) of dCas9 followed by quantitative PCR (qPCR) with universal GC-rich gene primers showed GC-rich ncRNA a strong enrichment of dCas9 at GC-rich ncRNA loci in two independent CRISPRi clones but not in two scrambled control clones (Fig. S2A). ChIP followed by massively parallel DNA sequencing (ChIP-seq) analysis of dCas9 showed enrichment at all 15 GC-rich gene loci in both CRISPRi clones but not in the scrambled control (at 12 hpi in Fig. 2B and 24 hpi in Fig. S2B).
To determine the transcriptional effect of GC-rich gene CRISPRi, we performed reverse transcription-quantitative PCR (RT-qPCR) with universal GC-rich ncRNA primers for two clones each of the CRISPRi and scrambled control lines. The housekeeping gene fructose-bisphosphate aldolase (PF3D7_1444800) was used for normalization. CRISPRi clones showed significantly reduced levels of GC-rich ncRNA compared to the control line clones at 12 and 24 hpi (see Fig. 2C and 4A, respectively).
The global transcriptional effects of GC-rich gene CRISPRi were analyzed by RNA-seq. Strikingly, two independent clones of the CRISPRi lines exhibited a global downregulation of var genes, suggesting a role for the GC-rich ncRNA in the activation of var gene transcription (Fig. 3A). Conversely, scrambled control clones showed transcription of a single dominant var gene (Fig. 3A), similar to wild-type clones (Fig. 1B), suggesting that the expression of dCas9 alone does not affect var gene transcription. Additionally, a rescue experiment was conducted by removing the drug pressure required to maintain the plasmids and using negative drug selection to ensure plasmid removal from the CRISPRi lines (Fig. S3). Rescue control clones recovered var mutually exclusive transcription (Fig. 3A). Differential gene expression analysis of CRISPRi and scrambled control clones with a false discovery rate (FDR) cutoff of 0.01 returned 125 genes, 115 (92%) of which were downregulated in the CRISPRi lines in three independent replicates ( Fig. 3B and C, Fig. S4, and Table S1). Among these downregulated genes were 13 GC-rich genes, 23 var genes (including the active var gene in the scrambled control clones, PF3D7_0712000), and several rif genes (Table S1). Differentially expressed genes other than GC-rich genes were validated for the lack of presentation of off-target dCas9 binding (Table S2). Transfection of CRISPRi plasmids into CSA-panned parasites also caused the line to display a downregulation of the GC-rich ncRNA and the repression of var mutually exclusive expression (Fig. S5). Altogether, our results suggest that GC-rich ncRNA transcription is essential for the mutually exclusive expression of var genes.
Given that the GC-rich gene family is located within several central chromosome regions that are silenced by facultative heterochromatin enriched in P. falciparum heterochromatin protein 1 (PfHP1), we hypothesized that HP1 occupancy would determine the variegated transcription profile for the GC-rich genes. Indeed, ChIP-seq using anti-HP1 antibodies revealed that, as with the var genes, all GC-rich genes except the single active GC-rich gene, which is adjacent to the active var gene, were enriched for HP1 (Fig. S6). Notably, in the CRISPRi clones, all GC-rich genes were enriched in HP1 (Fig. S6).
GC-rich ncRNA CRISPRi affects other multigene families encoding variant surface antigens. GC-rich ncRNA is highly transcribed at the same later time point (ϳ24 hpi) as the rif and stevor genes (21,22). Interestingly, several virulence gene families with a transcriptional peak in the blood-stage cycle later than that for var genes showed significant transcriptional downregulation upon GC-rich gene CRISPRi, even at 12 hpi (Fig. 3B and Table S1). To investigate a potential role for GC-rich ncRNA in the transcriptional control of these gene families, we performed RNA-seq and differential expression analysis in the CRISPRi lines at 24 hpi. The total number of genes significantly differentially expressed (FDR cutoff, 0.01) between the CRISPRi clones and the scrambled control clones at 24 hpi was 77, of which the majority (77%) were downregulated (Fig. 4B, Fig. S7, and Table S1) in three independent replicates. Besides GC-rich genes, most significantly downregulated genes belonged to multigene families encoding variant surface antigens with 2 transmembrane (2TM) domains (23): rif, Pfmc-2TM, and stevor (Fig. 4B). These three multigene families exhibited a transcription downregulation of most members in the CRISPRi lines compared to their expression in the control lines (Fig. 4C). In the case of the Pfmc-2TM gene family, the global transcription level was significantly lower in the CRISPRi lines than in the control lines, whereas the total levels of the stevor and rif gene families were not significantly lower. Altogether, these data strongly suggest that the GC-rich ncRNA is an important trans-activating element shared by at least the Pfmc-2TM and var gene families.

DISCUSSION
The perinuclear compartment that is key to the mutually exclusive expression of a single var gene in P. falciparum remains poorly understood. In a previous study, we identified the GC-rich gene family to be the first trans-acting ncRNA localizing to this expression site (18). Here, we show that this ncRNA is essential for the transcriptional activation of a single var gene, and we provide evidence that this function of the GC-rich element is shared with other clonally variant gene families.  PF3D7_0115700  PF3D7_0200100  PF3D7_0223500  PF3D7_0300100  PF3D7_0324900  PF3D7_0400100  PF3D7_0400400  PF3D7_0412400  PF3D7_0412700  PF3D7_0412900  PF3D7_0413100  PF3D7_0420700  PF3D7_0420900  PF3D7_0421100  PF3D7_0421300  PF3D7_0425800  PF3D7_0426000  PF3D7_0500100  PF3D7_0600200  PF3D7_0600400  PF3D7_0617400  PF3D7_0632500  PF3D7_0632800  PF3D7_0700100  PF3D7_0711700  PF3D7_0712000  PF3D7_0712300  PF3D7_0712400  PF3D7_0712600  PF3D7_0712800  PF3D7_0712900  PF3D7_0733000  PF3D7_0800100  PF3D7_0800200  PF3D7_0800300  PF3D7_0808600  PF3D7_0808700  PF3D7_0809100  PF3D7_0833500  PF3D7_0900100  PF3D7_0937600  PF3D7_0937800  PF3D7_1000100  PF3D7_1041300  PF3D7_1100100  PF3D7_1100200  PF3D7_1150400  PF3D7_1200100  PF3D7_1200400  PF3D7_1200600  PF3D7_1219300  PF3D7_1240300  PF3D7_1240400  PF3D7_1240600  PF3D7_1240900  PF3D7_1255200  PF3D7_1300100  PF3D7_1300300 Rescue control E4 Ch 08 By performing RNA-seq analysis on freshly cloned parasite lines that each transcribed a single var gene, we showed that GC-rich ncRNAs are transcribed in a clonally variant manner (Fig. 1B). We observed two profiles of GC-rich gene transcription, depending on the relative chromosomal location of the GC-rich genes and active var gene. In cases in which there was one GC-rich gene predominantly transcribed (transcribed at a level 5-to 10-fold higher than that for other members), it was always found GC-Rich ncRNA and Antigenic Variation in P. falciparum ® adjacent to the 5= region of an active central var gene. The ncRNA transcription profile for a clone with an active central var gene or a subtelomeric var lacking an adjacent member of the GC-rich gene family at its 5= upstream region showed multiple ncRNA transcripts, but at levels much lower than those in the former case. It is tempting to speculate that high levels of GC-rich ncRNA transcription adjacent to a central var gene may stabilize the expression site of a central var gene over that of subtelomeric var genes. A previous study reported variable switch rates, depending on the chromosomal location of var genes, with central var genes being more stably expressed and less prone to switching than the subtelomeric ones (24). Our data suggest that varying levels of ncRNA at the var gene expression site modulate the switch rate of individual var genes. Furthermore, transcription of GC-rich genes may open the local chromatin structure and enhance the accessibility of the transcription machinery to the adjacent var gene. This hypothesis finds support from the findings of a recent study showing the increased chromatin accessibility of GC-rich genes when they are adjacent to the active var gene and/or rif gene (25).
Until recently, it was not possible to inactivate an entire multigene family dispersed over many chromosomes in P. falciparum. We adapted the CRISPRi technique for the simultaneous knockdown of the entire GC-rich multigene family by targeting a conserved region that includes part of the polymerase III (Pol III) B box, present in all 15 members. All members of this GC-rich family have unique DNA motifs (internal A and B boxes) (18,19) found only at polymerase III-transcribed tRNA genes and short interspersed nuclear elements (SINEs) in other organisms (26), suggesting that transcription of this multigene family is mediated by Pol III. Upon downregulation of GC-rich ncRNA transcription, var gene expression was abolished, revealing an unprecedented regulatory interaction between Pol III-and Pol II-transcribed clonally variant genes.
Interestingly, the GC-rich gene family is conserved throughout all Laverania subgenera of Plasmodium, along with var genes and other clonally variant gene families involved in immune evasion, such as rif and stevor (27)(28)(29). Since GC-rich genes are transcribed at their highest levels at approximately 24 hpi, the question arises whether GC-rich ncRNA might also play a direct or indirect role in regulating clonally variant virulence gene families expressed at later stages of the asexual blood cycle, such as rif (21,30). A previous study suggested that an activation factor may be common to multiple clonally variant families (31). Our work suggests that GC-rich ncRNA could indeed be such a factor regulating different clonally variant gene families.
Although the precise molecular mechanism of ncRNA action remains to be investigated using techniques such as chromatin isolation by RNA purification (ChIRP), we postulate that the ncRNA associates with var gene control regions and acts as an activator. Indeed, a recent study showed that Pol III-transcribed SINEs act as enhancers of Pol II gene activation in response to the depolarization of neurons (32). However, the lack of sequence homology between GC-rich genes and var loci suggests the need for additional protein factors for such a physical interaction. Alternatively, GC-rich ncRNA could interact with nascent var mRNA, stabilizing it for transcription. It is also possible that GC-rich ncRNA could participate in ncRNA-mediated HP1 eviction from heterochromatic var genes, as previously described in the fission yeast Schizosaccharomyces pombe (33). Whichever hypothesis is correct, an essential next step would be to use this ncRNA as a molecular tool to pull down interacting partners from the var expression site, elucidating the molecular mechanism of var gene activation.
In conclusion, we developed a novel CRISPRi system that allows for the simultaneous downregulation of the entire GC-rich multigene family. In doing so, we establish the GC-rich ncRNA as an epigenetic regulatory element that plays a role in the activation of var gene transcription and the transcription of several other clonally variant gene families. We also provide a first glimpse into the molecular process that controls the switch rates of var genes, which is currently a black box in the field of var gene transcription. The identification of a trans-activating factor of the expression site analysis was performed with the R package edgeR (41) with an FDR threshold of 0.01. Normalization of the gene counts according to the number of reads per kilobase per million mapped reads (RPKM) for gene length and sequencing depth was performed using the R:limma package (42). Data were visualized using the Integrative Genomics Viewer (43).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.

ACKNOWLEDGMENTS
We thank Gretchen M. Diffendall for proofreading the manuscript. We declare no competing interests.