ABSTRACT
To investigate the transmission of novel infectious agents by blood transfusion, we studied changes in the virome composition of blood transfusion recipients pre- and posttransfusion. Using this approach, we detected and genetically characterized a novel human virus, human hepegivirus 1 (HHpgV-1), that shares features with hepatitis C virus (HCV) and human pegivirus (HPgV; formerly called GB virus C or hepatitis G virus). HCV and HPgV belong to the genera Hepacivirus and Pegivirus of the family Flaviviridae. HHpgV-1 was found in serum samples from two blood transfusion recipients and two hemophilia patients who had received plasma-derived clotting factor concentrates. In the former, the virus was detected only in the posttransfusion samples, indicating blood-borne transmission. Both hemophiliacs were persistently viremic over periods of at least 201 and 1,981 days. The 5′ untranslated region (UTR) of HHpgV-1 contained a type IV internal ribosome entry site (IRES), structurally similar to although highly divergent in sequence from that of HCV and other hepaciviruses. However, phylogenetic analysis of nonstructural genes (NS3 and NS5B) showed that HHpgV-1 forms a branch within the pegivirus clade distinct from HPgV and homologs infecting other mammalian species. In common with some pegivirus variants infecting rodents and bats, the HHpgV-1 genome encodes a short, highly basic protein upstream of E1, potentially possessing a core-like function in packaging RNA during assembly. Identification of this new human virus, HHpgV-1, expands our knowledge of the range of genome configurations of these viruses and may lead to a reevaluation of the original criteria by which the genera Hepacivirus and Pegivirus are defined.
IMPORTANCE More than 30 million blood components are transfused annually in the United States alone. Surveillance for infectious agents in the blood supply is key to ensuring the safety of this critical resource for medicine and public health. Here, we report the identification of a new and highly diverse HCV/GB virus (GBV)-like virus from human serum samples. This new virus, human hepegivirus 1 (HHpgV-1), was found in serum samples from blood transfusion recipients, indicating its potential for transmission via transfusion products. We also found persistent long-term HHpgV-1 viremia in two hemophilia patients. HHpgV-1 is unique because it shares genetic similarity with both highly pathogenic HCV and the apparently nonpathogenic HPgV (GBV-C). Our results add to the list of human viruses and provide data to develop reagents to study virus transmission and disease association and for interrupting virus transmission and new human infections.
INTRODUCTION
Transfusion of blood or blood-derived products can save lives and improve health but requires safety measures for preventing bystander transmission of infectious agents. Exclusion of blood products contaminated with infectious agents includes rigorous testing for pathogenic viruses (1, 2). The unusually high infection prevalence of hepatitis C virus (HCV), HBV, and HIV in hemophilia patients and other transfusion recipients could have been prevented by earlier identification of these viruses and development of accurate diagnostic assays (3–8). Advances in sequencing platforms have revolutionized identification of new viruses (9) and have been widely adopted for identification and characterization of viruses that infect humans and animals (10–15). Genetic characterization of new viruses is a first crucial step toward their molecular and biological characterization. Sequence data of a new virus not only help in estimating its biological properties but also enable development of molecular reagents that can be used for molecular diagnostics and epidemiological and clinical investigations of their transmission and disease associations (9).
HCV infects ~200 million people worldwide and represents a genetically diverse group of human RNA viruses with similar biological properties. HCV typically persists, with only 20 to 30% of cases resolving after acute infection (16, 17). HCV is a global pathogen, as a large proportion of untreated individuals progress to severe liver disease, cirrhosis, and hepatocellular carcinoma. HCV therapies based on antivirals that directly target HCV replication are expensive, and prevention of new virus infections remains the priority (18). The identification of HCV in 1989 (19) was crucial for development of diagnostic assays to arrest its spread through transfusion and other parenteral routes of exposure. Other human viruses genetically related to HCV include human pegivirus (HPgV, formerly named GB virus C or hepatitis G virus) and a growing range of related viruses currently known to infect horses, rodents, bats, and monkeys (reviewed in references 20 and 21). HPgV causes widespread human infection (1 to 4% of healthy blood donors) that can persist for years with no clinical symptoms (22).
Identification of natural infection with nonprimate hepacivirus (NPHV), first in dogs (initially termed canine hepacivirus) and subsequently in horses, was the first indication of a wider host range for hepacivirus infections (23–25). More recently, identification of a diverse range of viruses found in bats, rodents, and Old World primates further expanded the host range of hepaciviruses (11, 12, 26–28). Similarly, several novel pegiviruses have recently been identified, with a single report of such viruses in bats in 2010 (29), followed by several recent reports of novel viruses in horses, rodents, and bats (11, 12, 26, 27, 30). These new discoveries have fundamentally revised our knowledge of viral diversity and host ranges of hepaciviruses and pegiviruses (20). However, so far, the only two viruses known to infect humans are HCV and HPgV. Here, we describe first identification of a highly divergent human virus, human hepegivirus 1 (HHpgV-1), that shares attributes of both HCV and HPgV.
RESULTS
Virome analysis of transfusion recipients.To characterize the changes in serum virome compositions of transfusion recipients, we studied pretransfusion and posttransfusion serum samples from subjects enrolled in the Transfusion-Transmitted Viruses Study (TTVS) from July 1974 through June 1980. One pre- and one posttransfusion serum sample from 44 individuals were enriched for viral nucleic acids and studied by unbiased amplification and Illumina sequencing (11). A total of 154 and 119 million reads were generated from the pretransfusion and posttransfusion samples, respectively. The descriptive statistics mean, standard deviation (SD), and median were calculated for viral abundance. The viral abundance at pretransfusion and the rate at posttransfusion were compared by paired Wilcoxon rank sum test. Comparison of posttransfusion samples with pretransfusion samples revealed significant increase in the numbers of total virus reads (P < 0.009) and anellovirus reads (P < 0.007) after transfusion (Fig. 1). To understand the nature and composition of increased anellovirus sequences, comparative analysis was done for all virus populations (virome) in pre- and posttransfusion samples from six TTVS subjects that showed nonsignificant increases in the numbers of normalized anellovirus reads and six TTVS subjects that showed >10-fold increases in the numbers of anellovirus virus reads in the posttransfusion samples. Our analysis shows that the majorities of increased anellovirus reads were new genotypes or species that were genetically different from those that existed in individuals before transfusion. However, we also noticed persistence of a proportion of anellovirus sequences that were genetically similar among pre- and posttransfusion samples (Fig. 1). Together, these findings indicate that these subjects were infected by both stable and fluctuating anellovirus populations that can exist as coinfections. Of the total virus reads, approximately 40% of the reads remained unclassified due to their low similarity to known viruses (E value range, 10−2 to 10−6). Further analysis of these unclassified reads revealed the presence of a novel flavivirus in two posttransfusion samples.
Virome analysis of TTVS pre- and posttransfusion samples. (Top) Statistical analysis of total virus sequences (left), anellovirus sequences (middle), and virus-like unclassified sequences (right) in pre- and posttransfusion samples. (Bottom) Metagenomic binning of all viruses sequences in pre- and posttransfusion samples from twelve TTVS subjects. Red arrows represent pre- and posttransfusion sample pairs that were similar in amount of total virus sequences, and purple arrows represent pre- and posttransfusion sample pairs where posttransfusion samples had significantly larger amounts of total virus sequences.
Identification of HHpgV-1.Bioinformatic analysis of sequence data of two serum samples, TTVS-772 and TTVS-790, both collected after blood transfusion, indicated the presence of a novel flavivirus, showing only distant sequence homology to HCV and HPgV. PCR assays were used to confirm the presence of these flavivirus-like sequences in a fresh aliquot of the original serum samples. After initial analysis of data, both samples, TTVS-772 and TTVS-790, were subjected to more in-depth sequencing, generating >100 million single-end Illumina reads. All sequence reads that mapped to genomes of known hepaciviruses and pegiviruses and showed protein similarity (E value < 0.0001) with different genomic regions of these viruses were connected using PCRs of intervening genomic fragments and Sanger dideoxy sequencing. Finally, the complete polyprotein coding region of human hepegivirus 1 (HHpgV-1) present in TTVS-790 was sequenced at more than 4× coverage, and the genomic termini were acquired using random amplification of cDNA ends (RACE) (23), yielding a 9,538-nucleotide genome of the new virus, HHpgV-1.
Polyprotein-coding region of HHpgV-1.A continuous reading frame spanned most of the HHpgV-1 genome, with a termination codon at position 9501 (Fig. 2). Predicting the start of the virus polyprotein was more problematic because several in-frame ATG (methionine) triplets were present at the 5′ end of the HHpgV-1 sequence. Of these, those at positions 195 and 204 diverged from the optimal Kozak motif (GTTC|ATG|GAGG and AGGGC|ATG|CCCA, respectively). The codon at position 330 had an optimal sequence (GCAAC|AUG|GGGU), the fourth (position 519) was suboptimal (CCGGU|AUG|UGTT), and the fifth (position 587) fell within a region of identifiable protein sequence homology to the E1 sequence of pegiviruses. Determination of the RNA structure of the 5′ untranslated region (UTR) (see below) provided separate evidence that the (third) ATG triplet at position 330 was the genuine initiating codon, and this has been assumed in the analysis of protein cleavage sites below.
Genome organization and polyprotein cleavage sites of HHpgV-1. The positions of cleavage sites of the viral NS2 (white triangles) and NS3 protease (grey triangles) were predicted by alignment and homology to sites in the nonstructural genes previously characterized in other pegiviruses and hepaciviruses. Cleavage sites in structural gene regions (black triangles) were independently predicted using the SignalP 4.1 server. Predicted N-linked glycosylation sites in envelope proteins are depicted by vertical arrows; the single predicted O-linked glycosylation site is indicated with a blue arrow. The amino acid sequence of the predicted Y protein is shown above the genome diagram. Charged residues and cysteines are in color.
Evolutionary relatedness of HHpgV-1 with hepaciviruses and pegiviruses.The translated ORF of HHpgV-1 was aligned with pegivirus and hepacivirus sequences from humans and chimpanzees (HPgV, simian pegivirus infecting chimpanzees [SPgVtro], and HCV), Old World primates (SPgV-OWM and SHcV-OWM), viruses formerly described as GB virus A (GBV-A) and GBV-B from New World primates (SPfV-NWM and SHcV-NWM), rodent variants from rats and mice (RPgV and RHcV-1 to RHcV-3) and horses (EPgV-1, EPgV-2, and NPHV), and several different groups of pegiviruses and hepaciviruses from bats (BPgV-1 to BPgV-5 and BHcV-1 to BHcV-4).
Translated amino acid sequences could be readily aligned over most of the genome of pegiviruses and the nonstructural gene region of hepaciviruses. Evolutionary relationships of HHpgV-1 with other groups were determined by phylogenetic analysis of the more highly conserved NS3- and NS5B-encoding regions. Trees from the two regions were congruent in which the HHpgV-1 sequence was distinct from all pegivirus and hepacivirus sequences published to date (Fig. 3). In both regions, sequences split into separate clades corresponding to classifications as hepaciviruses and pegiviruses. Within the latter clade, two genetically distinct lineages were apparent (groups 1 and 2); group 1 comprised variants from humans, primates, horses, and bat lineages 1, 4, and 5, while the HHpgV-1 sequence and variants from rodents and bat lineages 2 and 3 fell into group 2.
Maximum-likelihood phylogenetic analysis of the NS3 and NS5B genes of HHpgV-1 (tt790) and available complete genome sequences of other hepaciviruses and pegiviruses infecting different mammalian species. Phylogenetic analysis of each data set used 100 bootstrap resamplings to determine robustness of grouping; values are shown on branches. Abbreviations: HcV, hepacivirus; PgV, pegivirus; OWM, Old World monkey; NWM, New World monkey; R, rodent; B, bat; Bo, bovine; NPHV, nonprimate hepacivirus. Lineage numbers are shown in parentheses.
Prediction of HHpgV-1 polyprotein cleavage sites.The positions of gene boundaries of HHpgV-1 were predicted using two approaches. Translated amino acid sequences in the 5′ structural gene region were submitted to the Signal P 4.0 server to identify signalase cleavage sites used by other pegiviruses to process structural genes (Fig. 2). This revealed three potential cleavage sites comparable in position to those previously identified for equine pegiviruses and in part to those in the annotation of SPgV-NWM. However, in addition to the E1, E2, and X proteins, HHpgV-1 was predicted to include an additional 65-amino-acid, cysteine-rich protein upstream of E1. Additional coding regions have been previously identified in rodent PgV (12) and labeled “VR” in the group 2 bat pegiviruses (lineages 2 and 3) (26). Y or VR proteins were highly variable in size and showed no identifiable sequence similarity to each other. The sequence from HHpgV-1 did, however, share the property of being highly basic with several positively charged residues (pI 8.6) (Fig. 2), comparable to previously analyzed bat sequences (pI 8.2 to 9.1) and RPgV (pI 10.1). The Y and VR proteins, differed, however, in their predicted locations, with rodent and bat Y/VR proteins possessing a predicted signalase cleavage site at the N-terminus that may re-locate the protein through the endoplasmic reticulum (ER). In contrast, the Y protein of HHpgV-1 possessed no such initial translocating sequence, which potentially indicates a cytoplasmic location for the protein. This processing resembles that of the core proteins of hepaciviruses, which functions in RNA packaging and virion formation. Although considerably shorter than the processed HCV protein, the Y protein of the HHpgV-1 sequence may have a similar function.
The E2 protein contained 11 potential N-linked glycosylation sites, a greater number than recorded for other pegiviruses; additional sites were detected in E1 (n = 3) and X (n = 1). E2 additionally contained a potential O-linked site at position 422 (Fig. 2). This pattern of heavy glycosylation, comparable in extent to that of HCV and other hepaciviruses, was distinct from the infrequent sites recorded for human and other group 1 pegi viruses. For prediction of cleavage sites between nonstructural proteins , there was sufficient similarity between the HHpgV-1 and other pegivirus sequences with annotated sites for homologous positions to be identified (Fig. 2). Proteins predicted in the NS region of HHpgV-1 were comparable in position and size to those of other pegiviruses and hepaciviruses (Fig. 2).
Sequence divergence between sequences of HHpgV-1 and other pegiviruses aligned over the genome was high, ranging from 61 to 65% with group 1 pegivirus sequences and 58% to 60% with group 2 in the structural gene region (Table 1). This corresponded to amino acid distance ranges of 68 to 79% and 69 to 71% in this genome region. The nonstructural gene regions were better conserved, with group 1 and 2 nucleotide distance ranges of 57 to 59% and 55% between HHpgV-1 and group 1 and 2 sequences, respectively, and 68 to 69% and 62 to 64% amino acid distances for those two groups (Table 2). As implied by these results, sequence divergence was distributed throughout the pegivirus genome, with ≥40% minimum amino acid sequence divergence values of windowed fragment between HHpgV-1 and each lineage of other pegiviruses (Fig. 4). Highest divergence was observed in the structural gene regions and NS4B/NS5A, while minimum values were found in association with functionally conserved polymerase and helicase motifs in NS5B and NS3.
Nucleotide sequence divergence of structural and nonstructural genes of HHpgV-1 and other pegiviruses
Translated amino acid sequence divergence of structural and nonstructural genes of HHpgV-1 and other pegiviruses
Amino acid sequence divergence between HHpgV-1 and other pegiviruses using a 240-nucleotide fragments with 24-nucleotide increments across the virus alignment (midpoint plotted on the x axis). Within-species distances for human and simian pegiviruses were included for comparison. The divergence scan was numbered using the AF121950 HPgV reference sequence. OWM, Old World monkey; NWP, New World primate; NWM, New World monkey.
Prediction of 5′ UTR RNA structure.The separate grouping of pegiviruses into two main clades in the coding region was reproduced in sequence relationships of the 5′ UTR. All sequences of group 1 viruses could be aligned and possessed predicted RNA secondary structures closely matching that of GBV-A and the recently predicted structure of equine pegivirus (11). Surprisingly, 5′ UTR sequences of group 2 pegiviruses showed no homology to those of group 1 but nevertheless could be aligned with each other. Although highly divergent in sequence from all other known viral 5′ UTR sequences, certain motifs matched those of members of the Hepacivirus genus, including the highly conserved sequence TACAGCCTGATAGGGT at position 274. The sequence lies within domain IIIe (pseudoknot region) in the hepacivirus internal ribosome entry site (IRES). Accordingly, the sequence of HHpgV-1 was structurally aligned to hepacivirus 5′ UTR sequences, an analysis that revealed the presence of a typical type IV IRES (Fig. 5) and upstream stem-loops corresponding in position, shape and size to domains I and II of HCV. However, the miR-122 seed match sequence (ACACUCCA) was absent in the 5′ UTR sequence of HHpgV-1 and all other known pegiviruses. As anticipated based on their sequence homology, the other group 2 pegiviruses have similarly structured type IV IRES elements despite their substantial sequence divergence from each other (data not shown). In this RNA structure model of the HHpgV-1 5′ UTR, the third ATG at position 330 that possesses the optimal Kozak sequence is located 11 bases downstream from the pseudoknot in domain IV and, by analogy, is the most plausible initiating codon for HHpgV-1.
Predicted RNA secondary structure of the 5′ UTR, based on structural mapping of the HHpgV-1 sequence to hepacivirus sequences based on an initial seed match in domain IIIe. Stem-loops for domains I and II were predicted by Mfold.
GORS.Thermodynamic folding analysis of the HHpgV-1 sequence revealed a mean 7.6% free energy difference between its minimum folding energy and that of sequence order-randomized controls (MFED), an observation consistent with the presence of genome-scale ordered RNA structure (GORS) in the HHpgV-1 sequence (23). However, this MFED value was lower than those of the group 1 viruses HPgV (mean, 11.7%; standard deviation, 0.7%), SPgV-OWM (10.9% ± 1.3%), SPgV-NWP (11.8% ± 0.4%), EPgV-1 (10.4%), and EPgV-2 (11.9%). However, the MFED value of HHpgV-1 was comparable to MFED values of group 2 pegiviruses (7.9% to 8.8%) (Table 3).
Mean folding energy values of different pegivirus lineages/groups
HHpgV-1 infections, prevalence, and coinfections.To determine the prevalence and nature of HHpgV-1 infection in humans, we tested pretransfusion and posttransfusion serum samples from 46 and 116 TTVS subjects, respectively. We also tested 1 to 5 longitudinally collected serum samples from 106 Multicenter Hemophilia Cohort Study (MHCS) subjects. Only two posttransfusion samples from TTVS subjects were positive for HHpgV-1 RNA, TTVS-772 and TTVS-790. The HHpgV-1-positive serum sample TTVS-772 was collected from a subject 17 days posttransfusion. A sample from the same individual collected one day before transfusion (TTVS-0927) and another collected 281 days after transfusion (TTVS-1095) tested negative for the virus. The HHpgV-1-positive serum sample TTVS-790 was collected at 7 days posttransfusion. A sample from the same individual collected one day before transfusion (TTVS-1608) and another collected at 241 days after transfusion (TTVS-791) tested negative for HHpgV-1 RNA. These results indicate transient viremia of HHpgV-1 in posttransfusion samples followed by virus clearance in both the infected patients.
Of the 106 MHCS subjects, multiple samples from only two subjects, M3127 and M4287, were positive for HHpgV-1 RNA. The two adjacent samples from subject M3127 positive for HHpgV-1 were collected 201 days apart, while a sample collected 144 days before the first positive sample and two other samples collected at days 560 and 3461 after the second positive sample remained negative for HHpgV-1 infection. For the second HHpgV-1-positive MHCS subject (M4287), four longitudinal samples were available for this study and all were positive for HHpgV-1. The latter three of these four samples were collected at 399, 742, and 1,981 days after the first sample collection day. These results indicate persistence of HHpgV-1 viremia for more than 201 and 1,981 days (5.4 years) in these two MHCS subjects.
All PCR products were sequenced from both directions, to confirm results and to study the genetic diversity among different HHpgV-1 variants. Phylogenetic analysis confirmed the close genetic relatedness among HHpgV-1 variants within their respective MHCS and TTVS subjects (Fig. 6). Sequences were highly conserved, with substitutions being restricted largely to synonymous sites. However, by analogy to other pegiviruses and hepaciviruses, greater variability is likely elsewhere in the genome. Coinfections with HCV and HPgV were determined by viremia screening of HHpgV-1-positive samples for for HPgV and HCV. Three of the four HHpgV-1-infected subjects were found to be coinfected with HCV (genotype 1a in two subjects and 1b in one subject). All HHpgV-1 subjects and their respective samples remained negative for HPgV coinfection.
ML phylogenetic analysis of partial NS3 sequences of HHpgV-1 variants found in different human serum samples using a rodent pegivirus sequence as the outgroup. Phylogenetic analysis used 100 bootstrap resamplings to determine robustness of grouping. HHpgV-1 variants found in serum samples from the same patients are labeled on the right (tt, TTVS; m, MHCS).
DISCUSSION
We used an unbiased (sequence-independent) virus discovery approach (9) to characterize known as well as unknown viruses that can pose a risk of transfusion transmission. We studied longitudinal pre- and posttransfusion samples from TTVS subjects for comparative analysis of changes in virome compositions after transfusion. The number of virus reads, particularly of anelloviruses, significantly increased after transfusion (Fig. 1). TTV species were studied in transfusion recipients earlier (31, 32), but the cross section design and low-depth sequencing approach used in previous studies were not suitable for characterizing the compositional changes in TTV populations over time. There can be several reasons for the changes we observed in the virome of transfusion recipients, such as deteriorating health of subjects and new nosocomial infections, including transfusion transmission of viruses. The presence of a novel human virus, HHpgV-1, in two transfusion recipients was our most interesting observation. The availability of longitudinal serum samples for TTVS subjects was helpful in determining the nature of HHpgV-1 infection; the virus infection was present only in the posttransfusion samples and was followed by clearance of viremia in both subjects. Although our results provide evidence for transfusion-mediated transmission of HHpgV-1, further studies are necessary to verify the presence of HpgV-1 in the transfused blood and to rule out other possible iatrogenic infection sources over this period. Although the presence of a new viral genome in the human serum samples confirms infection, more evidence should be sought, including seroconversion of infected patients; therefore, we are developing a serological assay for HHpgV-1 (23, 33). Considering that processing clinical samples in a laboratory, as well as contaminated nucleic acid extraction reagents, can be a potential source of divergent viruses (9, 34), we always included appropriate negative controls in all experiments of this study. The control samples included in this study were water and PBS extracted with each batch of serum samples and also nontemplate reagent controls for reverse transcription and screening PCR assays. Moreover, comparison of HHpgV-1 sequences amplified from different samples collected from different individuals and at different time points revealed the existence of some diversity, concentrated at synonymous sites; the finding of patient-specific groupings of sequences (Fig. 6) similarly argues strongly against laboratory or reagent contamination as the HHpgV-1 sequences.
HHpgV-1 was identified by Illumina sequencing in the posttransfusion samples from two of the 46 originally enrolled TTVS subjects. Subsequent specific PCR screening of posttransfusion samples from 70 TTVS subjects failed to identify any further positive samples. HHpgV-1 was also detected only infrequently in hemophiliacs (2 of 106 subjects). The infrequency of detection of HHpgV-1 compared to HCV and HPgV in this risk group may be explained by the fact that (i) HHpgV-1 is a much less prevalent human infection, (ii) this virus establishes persistence less frequently, or (iii) the PCR-based screening method used for viremia detection was unable to amplify sequences of more genetically divergent variants of HpgV-1. Historically, the latter explanation led to an underestimate of the prevalence of infection of HCV (35) that was only resolved when the full extent of the genotype diversity of HCV was identified. Determining the incidence and prevalence of human HHpgV-1 infection will require additional molecular and serological studies.
The genomic organization of HHpgV-1 revealed several novel genetic features in this new human virus. First, the 5′ UTR was nonhomologous to that of most other pegiviruses, forming a type IV IRES, hitherto described only for members of the genera Hepacivirus and Pestivirus of Flaviviridae. HHpgV-1 shared sequence motifs with other group 2 pegiviruses, despite their sequence divergence from each other, allowing the retrospective identification of this IRES type in all members of this pegivirus subgroup, undocumented in the original publications (12, 26). This finding is complementary to the earlier description of pegivirus-like IRES sequences in three bat-associated hepaciviruses (27). As with picornaviruses (36), exchange of the translation modules between otherwise distantly related viruses evidently occurs relatively often in the evolutionary history of RNA viruses, although the forces driving such modular exchanges and effects of different IRES types on virus replication strategies and interaction with host cells are largely unknown. However, variable dependence of different IRES types on host translation initiation factors, such as eIF3 and eIF-2α (37, 38), clearly influences their ability to withstand shutoff of host cell translation mediated through stress responses or induction of interferon-stimulated genes (39). Another genomic feature of HHpgV-1 is the presence of short coding sequences upstream of E1 (Fig. 3). This coding region encodes a predicted cytoplasmic basic protein that, although considerably shorter than the HCV core protein, may perform similar functions in RNA binding and packaging during virus assembly. HHPgV-1 similarly resembles HCV and several other hepaciviruses in possessing numerous sites for potential N-linked glycosylation in the predicted envelope genes, particularly E2, in which 11 N-linked and one predicted O-linked sites are located. This pattern of heavy glycosylation, which contrasts with the paucity of sites in HPgV and most other pegiviruses, is believed to shield viral envelopes from antibody-mediated neutralization (40, 41) and to participate in virus entry into hepatocytes (42). It is conceivable that HHpgV-1 virions display similar host interactions.
The finding of persistent viremia in hemophiliacs in sequentially collected samples and the time interval between sampling and their previous exposure to potentially infectious clotting factor concentrates (1989 to 1990) demonstrates the propensity of HHpgV-1 for long-term host persistence, an attribute shared with most, if not all, hepaciviruses and pegiviruses. HHpgV-1 resembles other viruses in these genera in possessing a large-scale structured RNA genome (GORS) that is tightly associated with host persistence throughout different families of positive-stranded mammalian RNA viruses (43, 44). Although this association is poorly characterized functionally (45), HHpgV-1 shares midrange MFED values with other group 2 pegiviruses and with HCV (Table 3). Future bioinformatics characterization of the configuration of RNA structures in such viruses, and potentially experimentally in RPgV, will be informative in furthering our understanding about this generic property of persistent RNA viruses.
The recent documentation of substantially greater sequence diversity of hepaciviruses and pegiviruses and the existence of mosaic features and variability in genome organizations may lead to a reevaluation of the original criteria by which the genera Hepacivirus and Pegivirus were originally defined (22). As reviewed above, possession of a type IV IRES does not define hepaciviruses, as this element is present among group 2 pegiviruses (Fig. 4). Similarly, some hepaciviruses have pegivirus-like 5′ UTR sequences. Possession of a core protein may turn out to be a similarly variable attribute, with several pegiviruses possessing coding sequences upstream of E1 and, in the case of HHpgV-1, a potential cytoplasmic location that may confer a biological function analogous to the hepacivirus core protein. Nevertheless, phylogenetic analysis of conserved genes in the nonstructural genome region clearly maintains separate, bootstrap-supported groupings of pegiviruses and hepaciviruses, while the E1/E2/X structural gene block of pegiviruses shows little or no homology with the E1 and E2 protein genes of hepaciviruses. Future characterization of these viruses in a wider range of hosts will provide more information on the value and durability of current criteria for Hepacivirus and Pegivirus genus assignments.
Comprehensive virome analysis helped us in the identification of HHpgV-1, a new human virus. We note that virome analysis also identified approximately 40% sequence reads that remained unclassified due to their low sequence similarity to known viruses or to any other known sequence. Viruses show remarkable diversity in terms of replication-expression strategy and genomic complexity (46). After acquiring the HHpgV-1 genome, we reverse mapped it to the unclassified virome reads and noticed that several unclassified sequence reads were part of the HHPgV-1 genome (core, E1, E2, NS4A, and NS4B genes). These findings suggest that other fractions of these unclassified reads may represent unrecognized highly diverse viruses or the most rapidly evolving part of their genomes. The genomic sequence data of HHpgV-1 will be helpful in designing molecular reagents to study its prevalence, persistence, transmission, and disease association in humans. Besides providing information regarding a new human virus infection, identification of HHpgV-1 expands our knowledge of the origin, genetic diversity, and evolution of hepaciviruses and pegiviruses.
MATERIALS AND METHODS
Human samples.All human serum samples used this study were obtained from the NHLBI sample repository and were collected as part of two studies, the Transfusion-Transmitted Viruses Study (TTVS) and the Multicenter Hemophilia Cohort Study (MHCS). The study plan and research were approved by Columbia University Medical Center, no. IRB-AAAN5157. TTVS samples were collected at 4 participating blood centers distributed across the United States from July 1974 through June 1980. The MHCS-I evaluated and prospectively followed patients with hemophilia or a related coagulation disorder form 1982 to 1986. Adults and children who had a congenital coagulation disorder (hemophilia A or B [congenital factor VIII or IX deficiency]), von Willebrand's disease, or another disorder were enrolled from 8 collaborating hemophilia centers in the United States between 1982 and 1985. Four additional centers from the United States and 4 centers from Europe joined the study between 1987 and 1990. In MHCS-I, subjects were evaluated semiannually with a standardized physical examination, abstraction of medical records, and phlebotomy (4–8).
Virome amplification, sequencing, and bioinformatic analysis.Serum samples were filtered through a 0.45-μm filter and treated with nucleases to digest free nucleic acids (NAs) for enrichment of viral NA and then extracted in NucliSens buffer using the automated easyMAG system (bio-Mérieux, United States). Total RNA extracts were reverse transcribed using a SuperScript III kit (Invitrogen Life Technologies) with random hexamer primers. The cDNA was RNase H treated prior to second-strand DNA synthesis using Klenow fragment (3′–5′ exonuclease negative) (New England Biolabs). The double-stranded cDNA was sheared to a 200-bp average fragment length using a Covaris E210 focused ultrasonicator. Sheared DNA was purified and used for Illumina library construction using a Kapa library preparation kit (KK8234; Kapa Biosystems) and SeqCap EZ Library SR (Nimblegen, Roche). The sequencing libraries were quantified using an Agilent Bioanalyzer 2100. Samples with low concentrations were amplified by increasing PCR cycle numbers from 9 to 14. All sequencing was done on the Illumina HiSeq 2500 platform (Illumina, San Diego, CA, USA), yielding an average of 150 million reads per sequencing lane. Virome analysis included only those sequences that were most closely related to vertebrate viruses and was done using MEGAN version 5.10.5 (47). Sequence data were demultiplexed using Illumina software to generate FastQ files for individual samples. Sequences were filtered using Q30 and mapped against reference genomes from GenBank with Bowtie2 mapper 2.0.6 (http://bowtie-bio.sourceforge.net). SAMtools (v 0.1.19) were used to generate the consensus genomes and coverage statistics. Integrative Genomics Viewer (v 2/3/55; Broad Institute) was used to generate the sequence coverage plots. Host-derived sequences were identified using Bowtie2-based sequence mappings against the reference host genomes downloaded from the NCBI database. Sequencing data obtained from the clinical samples were preprocessed using PRINSEQ (v 0.20.2) software, and primer-trimmed, quality score-filtered reads were aligned against the host reference databases to remove the host background. The host-subtracted sequence reads were de novo assembled using MIRA (v 4.0) or SOAPdenovo2 (v 2.04) assemblers, and then contigs and unique single sequences were subjected to a homology search using MegaBlast against the GenBank nucleotide database. Sequences that showed poor or no homology at the nucleotide level were subjected to a search with BLASTx against the viral GenBank protein database. Viral sequences from BLASTx analysis were subjected to another round of BLASTx homology search against entire GenBank protein database to correct for biased E values due to the smaller size of the virus-only database, and taxonomy was reassigned. Based on the contigs identified for different viral strains, GenBank sequences were downloaded and used for mapping the whole data set to recover partial or complete genomes. The descriptive statistics mean, standard deviation (SD), and median were calculated for viral count and viral rate. The pretransfusion viral rate and the posttransfusion rate were compared by paired Wilcoxon rank sum test.
Screening assays of HHpgV-1.The polyprotein and UTR of HHpgV-1 were aligned to all known hepaciviruses and pegiviruses. Nucleotide and amino acid motifs showing relative conservation among different virus lineages were used to make primers for screening of samples for HHpgV-1 and related variants. All PCR mixtures used AmpliTaq Gold 360 master mix (catalog no. 4398881; Applied Biosystems) and 2 µl of cDNA. The HHpgV-1 helicase gene heminested PCR assay used primer pair HHpgV-ak1 (5′-GTTGTATTCGCCACAGCCAC-3′) and HHpgV-ak2 (5′-TCAAAGTTTCCTGTGTAGCCTGT-3′) in the first round of PCR and the pair HHpgV-ak3 (5′-GTATTCGCCACAGCCACTCC-3′) and HHpgV-ak2 in the second round of PCR. For the first round, the PCR cycle included 8 min of denaturation at 95°C, 10 cycles of 95°C for 40 s, 56°C for 1 min, and 72°C for 1 min, 30 cycles of 95°C for 30 s, 55°C for 1 min, and 72°C for 1 min, and a final extension at 72°C for 5 min. In first 10 cycles, the annealing temperature was ramped down by 0.5°C each cycle to allow mutation tolerance during primer hybridization. For the second round, PCR conditions included 8 min of denaturation at 95°C, 10 cycles of 95°C for 40 s, 60°C for 1 min, and 72°C for 1 min, 30 cycles of 95°C for 30 s, 56°C for 1 min, and 72°C for 1 min, and a final extension at 72°C for 5 min. The hepatitis C virus PCR assay used the primer pair HCV-ak-F1 (5′-GCGCCCATCACGGCITAYGC-3′) and HCV-ak-R1 (5′-GTCTTGGTCCACRTTGGTRTACAT-3′) in the first round of PCR and the pair HCV-ak-F2 (5′-GCCCATCACGGCGTAYGCNCARCA-3′) and HCV-ak-R2(5′-CTTGGTCCACRTTGGTRTACATYTG-3′) in the second round of PCR. For the first round, the PCR cycle included 8 min of denaturation at 95°C, 10 cycles of 95°C for 40 s, 58°C for 1 min, and 72°C for 45 s, 30 cycles of 95°C for 30 s, 55°C for 1 min, and 72°C for 45 s, and a final extension at 72°C for 5 min. In the first 10 cycles, the annealing temperature was ramped down by 0.5°C each cycle to allow mutation tolerance during primer hybridization. For the second round, PCR conditions included 8 min of denaturation at 95°C, 10 cycles of 95°C for 40 s, 64°C for 1 min, and 72°C for 45 s, 30 cycles of 95°C for 30 s, 59°C for 30 s, and 72°C for 40 s, and a final extension at 72°C for 5 min. The GBVc-NS3 PCR assay used the primer pair GBVc-ak-F1 (5′-CCTTGGACCCAGGTICCNACIGA-3′) and GBVc-ak-R1 (5′-CCTGGTGGGGTRGCIGTNGC-3′) in the first round of PCR and the pair GBVc-ak-F2 (5′-GGACCCAGGTGCCIACGGAYGC-3′) and GBVc-ak-R2 (5′-CCTGGTGGGGTRGCGGTNGCRTA-3′) in the second round of PCR. For the first round, the PCR cycle included 8 min of denaturation at 95°C, 10 cycles of 95°C for 40 s, 63°C for 1 min, and 72°C for 30 s, 30 cycles of 95°C for 30 s, 60°C for 1 min, and 72°C for 40 s, and a final extension at 72°C for 5 min. In first 10 cycles, the annealing temperature was ramped down by 0.5°C each cycle to allow mutation tolerance during primer hybridization. For the second round, PCR conditions included 8 min of denaturation at 95°C, 10 cycles of 95°C for 40 s, 71°C for 1 min, and 72°C for 30 s, 30 cycles of 95°C for 30 s, 64°C for 30 s, and 72°C for 40 s, and a final extension at 72°C for 5 min.
Phylogenetic and RNA secondary structure analysis.Nucleotide sequences (5′ UTRs) and translated protein sequences (coding regions) were aligned using the program MUSCLE as implemented in the SSE package (37). Sequence divergence scans were performed, and summary values for different genome regions were generated by the program Sequence Distance in the SSE package. Bootstrapped maximum-likelihood trees for the NS3 helicase region and NS5B polymerase regions were constructed using the maximum-likelihood algorithm as implemented in MEGA, using the optimal model selected by Model Test. For both regions, this was the Le-Gascuel model with a gamma distribution (5 rates) and invariant sites (LG + γ + I). Phylogenetic analysis of NS3 nucleotide sequences was performed similarly using the Jukes-Cantor substitution model (selected by Model Test for this data set).
RNA structures were predicted by Mfold and by homology searching and structural alignment with bases conserved in other hepaciviruses. Structure prediction for the pseudoknot region in HCV (IIIf) and homologous pairings in other hepaciviruses could not be predicted by Mfold or other conventional RNA secondary-structure prediction algorithms. Structure predictions upstream of stem-loop III were performed by Mfold.
Cleavage sites between HHpgV-1 structural proteins were predicted using the SignalP version 4.1 program (16); these were concordant with positions predicted from the sequence alignment. Cleavage between nonstructural genes was predicted by alignment of the NS2/NS3, NS3/4A, NS4A/4B, NS4B/5A, and NS5A/5B sites previously proposed for simian pegiviruses (18).
HHpgV-1 and other pegivirus sequences were analyzed for the presence of GORS by comparing folding energies of consecutive fragments of nucleotide sequence with random sequence order controls using the program's MFED scan in the SEE package (37). Minimum folding energies (MFEs) of rodent virus genomes were calculated by using the default setting in the program Zipfold. MFE results were expressed as MFEDs, i.e., the percentage difference between the MFE of the native sequence from that of the mean value of the 50 sequence order-randomized controls (32).
Nucleotide sequence accession number.The complete genome of HHpgV-1 has been submitted to GenBank under accession number KT439329.
ACKNOWLEDGMENTS
The manuscript was prepared using MHCS and TTVS research materials obtained from the NHLBI Biological Specimen and Data Repository Information Coordinating Center and does not necessarily reflect the opinions or views of the MHCS, the TTVS, or the NHLBI.
We thank Joel Garcia for sequencing library preparation and Komal Jain and Adrian Caciula for bioinformatics analysis.
The study was supported by NIH grants HL119485 (A. Kapoor), AI107631 (A. Kapoor) and AI109761 (Lipkin). P.D.B. was supported by the Intramural Research Program of National Institute of Dental and Craniofacial Research, NIH.
FOOTNOTES
- Received 2 September 2015
- Accepted 9 September 2015
- Published 22 September 2015
- Copyright © 2015 Kapoor et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported license, which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited.