Challenges in Quantifying Cytosine Methylation in the HIV Provirus

DNA methylation is an epigenetic mechanism most commonly associated with transcriptional repression. While it is clear that DNA methylation can silence HIV proviral expression in in vitro latency models, its correlation with HIV persistence and expression in vivo is ambiguous, particularly in persons living with HIV (PLWH) receiving antiretroviral therapy (ART).

H IV latency is likely regulated through epigenetic mechanisms (1), yet few studies have evaluated epigenetic marks within and surrounding the HIV provirus (1)(2)(3)(4)(5)(6)(7)(8). DNA methylation is one epigenetic mark that silences genes when located in promoters (9). While it is one of the best-understood epigenetic modifications (9), its role in HIV latency remains elusive. Several studies have attempted to characterize the role of DNA methylation in HIV latency (2)(3)(4)(5)(6), but their conclusions are inconsistent, and attempts to expound upon the findings in clinical samples from people living with HIV (PLWH) have been low yield.
Consequently, some researchers might be tempted to move away from evaluation of HIV DNA methylation, believing it plays no role in transcriptional control of HIV. While this conclusion is potentially correct, we argue that existing studies are incomplete and have failed to examine HIV proviral DNA methylation in the appropriate context, and with the appropriate tools. Many of these limitations result from a lack of available methods for specific assessment of the replication-competent HIV reservoir in clinical samples, as well as technical challenges of studying DNA methylation in the context of HIV. Here, we review and discuss the work that has been done on HIV DNA methylation and the challenges that should be overcome.
HIV studies of DNA methylation in the literature. Early reports suggest that DNA CpG methylation silences HIV proviral expression (8). More recent studies of CpG methylation were performed using in vitro HIV latency models in both cell lines and primary CD4 ϩ T cells (2). While CpG methylation can silence proviral expression in latency models (2,3), its role in vivo remains elusive. Studies of CpG methylation in the HIV promoter from clinical samples demonstrated that this modification is inversely correlated with viremia and proviral reactivation, suggesting that CpG methylation of the HIV promoter is a regulator of latency (3). Further evaluation has resulted in discrepancies, where long-term antiretroviral therapy (ART) is correlated with low cumulative CpG methylation in some studies (4, 6), but not in others (5). Interestingly, long-term nonprogressors and elite controllers exhibit large amounts of methylation in both the long terminal repeat (LTR) and the env/tat/rev CpG island (7). Together these data suggest that latency and ART-induced suppression might have different epigenetic signatures.
While there is enough evidence to suggest that CpG methylation can affect proviral expression and latency, it is difficult to correlate its presence in clinical samples with specific outcomes, such as latency. One major stumbling block is the inability to identify latently infected cells in clinical samples collected during ART, since suppression of viral replication with ART is distinct from latency.
Evaluation of CpG islands. Conventionally, cytosine methylation in mammals occurs in palindromic CpG dinucleotides. Two-thirds of promoters in the human genome contain clusters of CpG residues, termed CpG islands (CGIs) (9). HIV contains two highly conserved CGIs in the proximal provirus: one within the LTR, and one immediately distal to the LTR (10). Most HIV DNA methylation studies have focused upon these two CGIs, with little regard for the rest of the provirus. There is also a highly conserved CGI in the env/tat/rev region (ETR), though the methylation status of this region has been reported in only one study from clinical samples (6). The remainder of the HIV genome is CpG depleted (11), mirroring the human genome. This feature has been proposed to result from deamination of cytosines over time following proviral integration (12).
Early research on DNA methylation of eukaryotic genes focused on CGIs, based on methylation data for these regions in cancer cells (9). While methylation of CGIs results in transcriptional repression, the majority of promoter CGIs in somatic cells are unmethylated and regulated by other mechanisms, such as histone modifications (9). In fact, Ten-eleven Translocation (TET) enzymes, which are responsible for initiating active demethylation, are congregated around CGIs, suggesting their purpose is to keep these regions resistant to methylation. Further, many histone-modifying enzymes and transcription factors bind specifically to unmethylated CpG clusters (9).
Recent data suggested that CpG methylation plays a more prominent role in promoters without CGIs (9). For CGI-containing promoters, differential cytosine methylation exists more commonly in the regions flanking CGIs, termed CpG shores and CpG shelves (13). In the context of HIV, these regions would encompass gag and part of pol, as well as 5= integration sites.
Only one study has examined areas outside the LTR in clinical samples using PCR-based assays (6). The areas examined in blood from 23 persons with HIV encompassed the most CpG-dense regions of the provirus, including an 898-bp region containing the LTR and a 5= portion of gag, and an 1,124-bp region in the 3= part of the virus containing parts of nef, tat, rev, and env. These regions account for only approximately 21% of the HIV genome, but contain over half (55 of 94, 58.5%) of the CpG dinucleotides, based on the HXB2 reference genome.
Underscoring the difficulty of using PCR-based methods for bisulfite sequencing of HIV in clinical samples, only 33 out of 88 DNA samples (37.5%) were successfully amplified in this study. Additionally, the authors found extremely few methylation events in the evaluated CpGs. Further, they examined the samples for unincorporated DNA and concluded that the vast majority of proviruses from their samples were from integrated DNA, suggesting that DNA methylation is unlikely to be important for HIV regulation following integration.
Unfortunately, even though this study provided the most comprehensive assessment of HIV methylation to date, the focus was limited to the areas immediately surrounding the CGI. Therefore, the most commonly differentially methylated regions typically present in CGI-containing genes (i.e., shores and shelves) still have not been assessed, either from clinical samples or from in vitro models.
Non-CpG methylation. New data recently demonstrated the presence of cytosine methylation outside CpG residues in mammalian DNA (14,15). While these non-CpG methylation events primarily exist in embryonic stem cells, they were also described in somatic tissues, particularly in brain (15). Non-CpG methylation is mediated by the de novo DNA methyltransferases, i.e., DNA methyltransferase 3a (DNMT3a) and DNMT3b, which are less specific for targeting CpG dinucleotides (16). Most somatic tissues express abundant DNA methyltransferase 1 (DNMT1), which targets only hemimethylated DNA. As a result, dividing cells exhibit either little or no non-CpG methylation, as DNMT1 is more CpG specific and is the predominant DNMT in dividing cells. This is likely the reason why non-CpG methylation is localized in nondividing cells, such as neurons (16). Additionally, embryonic stem cells exhibit substantially higher expression of DNMT3a and DNMT3b, making them more prone to non-CpG methylation (16).
Mined data from RNA-Seq expression studies (17) of HIV-infected SUP-T1 cells demonstrated that DNMT expression was dramatically altered in HIV infection (Fig. 1), and early studies reported increased de novo methylation in HIV-infected cells (18,19). This could contribute to methylation outside CpG residues in HIV-infected cells experiencing de novo methylation of the newly integrated provirus. Additionally, non-CpG methylation was reported in the context of other exogenous retrovirus infections, such as murine leukemia virus (20,21).
So far, all studies of HIV DNA methylation have focused only on CpG methylation. In fact, some report dismissing clones where unconverted non-CpG cytosines are present (7). Part of the challenge of examining non-CpG methylation in clinical samples is that nested PCR-based methods will exclude most non-CpG methylation (16). However, whole-genome bisulfite sequencing of DNA from a clinical peripheral blood sample demonstrates that not only are there dense regions of cytosine methylation in the HIV provirus ( Fig. 2A), but the majority of methylated cytosines in a densely methylated region of the proximal provirus (42 out of 61, or 68%) were methylated at non-CpG residues (Fig. 2B).
Pitfalls of PCR amplification of HIV from bisulfite-converted DNA. During active replication, HIV mutates rapidly within a person (22). Therefore, designing specific PCR primers that amplify all HIV targets from clinical samples is challenging. Additionally, designing primers for bisulfite-converted DNA requires longer oligonucleotides to achieve appropriate annealing temperature and specificity, exacerbating the issue of Perspective ® biased amplification from highly variable sequences (23). Some variants will likely be missed, reducing the chance that all methylation events are recorded. This becomes particularly difficult in the context of highly methylated non-CpG cytosines (Fig. 2), since all cytosines in the primers need to be degenerate to avoid bias against methylation. Therefore, primers designed without degenerate bases at all cytosine positions will likely miss the majority of the highly methylated regions, resulting in artifactually sparse methylation (Fig. 3).
Multiple rounds of PCR can introduce stochastic bias in the presence of multiple variants (24); thus, it is difficult to measure methylation accurately in such cases. Indeed, PCR amplification of the HIV LTR from clinical samples using the same conditions can yield variable results across experiments, even with next-generation sequencing (NGS) technologies (Fig. 4A). Further, experiments to PCR amplify clones with mixed cytosine quantities showed that with both Sanger sequencing and NGS, percent methylation is  (Fig. 4B). This highlights the need for a different method for methylation quantification of the HIV provirus.
Bisulfite conversion and amplification of autosomal genes begin with much larger amounts of template, thus reducing the rounds of amplification required for successful cloning and sequencing. Because HIV is present in less than 100 copies per 100,000 CD4 ϩ T cells (25), studying HIV DNA methylation necessitates more rounds of amplification. Additionally, autosomal genes are far less variable and in a static location compared to integrated HIV provirus. These technological hurdles call into question the usefulness of methylation quantification of the HIV provirus with this classical method.   (6), published HIV methylation studies using clinical samples have examined Ͻ20 clones for most individuals (3)(4)(5). However, PLWH on long-term ART averaged 15 different HIV DNA sequences per individual (26). Further, the vast majority of the proviral populations in these individuals (i.e., 98%) were composed of defective sequences (26). As a consequence, this low level of sampling will not suffice to interrogate all proviral copies present in clinical samples, and because only a small subset of HIV DNA is replication competent, the majority of the data from clinical samples are from defective proviruses. Statistically, it would require at least 50 clones per individual to obtain at least one sequence from an intact provirus.
It is therefore not surprising that functional correlations with plasma HIV RNA were not evident in most cases, and cell-associated HIV RNA might be a better correlative measure. Alternatively, NGS offers much higher sampling depth than cloning and Sanger sequencing, yet studies using NGS for HIV methylation analysis have not made their way into the literature.

CONCLUSIONS
The field of epigenetics is still new and requires more work before we can fully understand the effect of DNA methylation even upon autosomal genes. Because the integrated HIV provirus is subject to its immediate chromatin environment, determining the influence of cytosine methylation on the provirus in vivo is not a simple endeavor, and intraexperimental variation needs to be normalized. Emerging technologies and NGS should allow us to make headway in this area. Regardless, non-CpG methylation of the provirus has been ignored and should not be discounted during analysis of these data. Studies that rely on PCR amplification of the HIV provirus from bisulfite-converted samples should also be reproducible across multiple experiments and with multiple primer sets, and these replicates should be included with the primary data in future publications.