Proposed Role for KaiC-Like ATPases as Major Signal Transduction Hubs in Archaea

ABSTRACT All organisms must adapt to ever-changing environmental conditions and accordingly have evolved diverse signal transduction systems. In bacteria, the most abundant networks are built around the two-component signal transduction systems that include histidine kinases and receiver domains. In contrast, eukaryotic signal transduction is dominated by serine/threonine/tyrosine protein kinases. Both of these systems are also found in archaea, but they are not as common and diversified as their bacterial and eukaryotic counterparts, suggesting the possibility that archaea have evolved other, still uncharacterized signal transduction networks. Here we propose a role for KaiC family ATPases, known to be key components of the circadian clock in cyanobacteria, in archaeal signal transduction. The KaiC family is notably expanded in most archaeal genomes, and although most of these ATPases remain poorly characterized, members of the KaiC family have been shown to control archaellum assembly and have been found to be a stable component of the gas vesicle system in Halobacteria. Computational analyses described here suggest that KaiC-like ATPases and their homologues with inactivated ATPase domains are involved in many other archaeal signal transduction pathways and comprise major hubs of complex regulatory networks. We predict numerous input and output domains that are linked to KaiC-like proteins, including putative homologues of eukaryotic DEATH domains that could function as adapters in archaeal signaling networks. We further address the relationships of the archaeal family of KaiC homologues to the bona fide KaiC of cyanobacteria and implications for the existence of a KaiC-based circadian clock apparatus in archaea.

Archaeal signal transduction systems have not been studied to a comparable extent (7), but studies support the existence of some archaea-encoded, complex signal transduction networks that mimic systems employed in bacteria and eukarya. Twocomponent systems homologous to bacterial counterparts have been experimentally characterized in Halobacteria that regulate complex protein-protein interaction networks that influence chemotaxis, phototaxis, and archaellum (archaeal rotary motor) activity (8). In the methanogenic archaeon Methanosaeta harundinacea, the histidine kinase FilI controls cell morphology and affects methane production (9). Phylogenetic analysis suggests that archaea acquired two-component signal transduction system components from bacteria on multiple occasions (10). This conclusion is compatible with the results of the reconstruction of the last archaeal common ancestor (LACA) on the basis of the arCOG (archaeal clusters of orthologous genes) database that estimated the probability of histidine kinases being present in the LACA at Ͻ0.2 (11). Notably, the two-component systems are mostly found in mesophilic archaea that appear to have captured numerous bacterial genes via horizontal gene transfer (HGT) (9, 10, 12-17) ( Table 1). Protein phosphorylation, mostly attributed to Ser/Thr/Tyr (here, S/T) protein kinases, apparently plays an important role in archaea, but details about the specific roles of protein phosphorylation in signal transduction are scarce (7). Unlike histidine kinases, three S/T protein kinase families, RIO1, RIO2, and SPS1 (corresponding to a The numbers of respective proteins were taken from previous publications (5,73) and/or retrieved from recent updates of the COG (18) and arCOG (66) (11). At least some of these kinases appear to be key regulators of the archaeal cell cycle, motility, and membrane remodeling (19). However, S/T kinases are not particularly prone to expansion in archaea (Table 1) and probably comprise only a limited part of the archaeal signal transduction networks. Given the relative paucity of identifiable signal transduction systems in Archaea, a search for new, perhaps, archaea-specific signal transduction systems is an important goal. Our previous analyses of type IV pili systems and the archaellum identified several KaiC-like ATPases (members of the COG0467 family) that appear to be involved in the regulation of these systems (20). These observations prompted us to undertake a comprehensive analysis of the KaiC family in archaea. Because of the high similarity to the eukaryotic recombinase component Rad55 (homologue of bacterial RecA and archaeal RadA), until recently, the COG0467 family (18) in archaea has been implicated in DNA recombination pathways (21). One of the archaeal proteins, namely, the SSO2452 protein of Sulfolobus solfataricus, has been experimentally studied in this context and was shown not to be an active recombinase but could bind single-stranded DNA and inhibit D-loop formation by RadA (22). However, outside the Archaea, the best-studied protein in this family is the cyanobacterial circadian clock ATPase KaiC, which does not appear to be involved in DNA recombination (23,24).
The cyanobacterial circadian clock system, an ATP-dependent, posttranslational molecular oscillator, has been thoroughly characterized biochemically, structurally, and functionally (25)(26)(27)(28)(29)(30)(31). Typically, the system consists of three protein components, KaiA, KaiB, and KaiC (Fig. 1A). The KaiA protein forms a homodimer that interacts directly with the C-terminal ATPase domain (CII) of KaiC and promotes its phosphorylation. Structurally, KaiA is a two-domain protein with an N-terminal four-helix bundle domain and a C-terminal OmpR-like winged helix-turn-helix (HTH) DNA-binding domain. KaiB has the thioredoxin fold and interacts with the N-terminal (CI) domain of KaiC, promoting dissociation of KaiA and dephosphorylation of KaiC. The cyanobacterium Prochlorococcus marinus encodes a minimal circadian system that lacks KaiA but nevertheless shows some features of an autonomous oscillator that, however, does not persist long under constant-light conditions, so that the system apparently requires a reset each diel cycle (26,32). However, even when all three components are present, this is not always sufficient to reproduce all of the canonical properties of a circadian clock, as is the case in the purple nonsulfur bacterium Rhodopseudomonas palustris, which only poorly maintains rhythmicity under constant conditions (33). Multiple input and output components have been shown to interact with the cyanobacterial circadian clock system, forming a complex, interconnected network that includes transcriptional regulators, receiver (REC) domains, and sensory histidine kinases, as well as light-sensitive redox molecules such as quinones (26). Some of the input and output proteins contain KaiAor KaiB-like domains and directly interact with KaiC.
Phylogenetic analysis indicates that the COG0467 family forms a separate clade within the RecA ATPase superfamily (34) (Fig. 1B), implying a separate function that does not involve DNA transactions. It has been hypothesized that, given the wide spread and major expansion of this family in archaea, which contrasts with its patchy distribution in bacteria, the KaiC component of the cyanobacterial circadian clock was acquired by HGT from archaea (34). Recently, the structure of the FlaH protein (COG2874 family), which is always encoded within the archaellum operon (Fig. 1C), has been solved and shown to be similar to the C-terminal domain of KaiC (35,36). FlaH has been shown to form a hexamer and interact with the archaellum subunit FlaI, the motor ATPase, and in crenarchaea, with the FlaX ring (35,36). KaiC family ATPase GvpD was found to be involved in the regulation of Halobacterium-specific gas vesicles (37) (Fig. 1C). Several halobacterial KaiC-like proteins have been studied with respect to their potential involvement in light-dependent gene expression (38). Very recently, KaiC proteins from the hyperthermophiles Thermococcus litoralis and Pyrococcus horikoshii were shown to be capable of KaiA-independent autophosphorylation at both 30°C and 75°C (34,39). Finally, structural analysis of a distinct family of archaea-specific uncharacterized proteins (DUF835, PF05763 in the Pfam database [40]) has shown that these proteins are inactivated ATPases that are most closely related to KaiC (41). Thus, currently, at least two additional protein families can be included in the archaeal KaiC group (Fig. 1B). Evolutionary reconstructions suggest that KaiC-like ATPases from arCOG01171, arCOG001174, and arCOG04148 (FlaH) were likely present already in the LACA (11).
Prompted by the above observations and the extraordinary diversity of the KaiC ATPases in archaea, we performed a comprehensive phylogenomic analysis of this protein family. The results strongly suggest that the KaiC family ATPases and their homologues with inactivated ATPase domains are key components of the archaeal signaling network(s).

RESULTS
Genomic census of the KaiC ATPase family in archaea and bacteria. To perform a comprehensive phylogenomic analysis of the KaiC ATPase superfamily, 2,635 sequences from the three KaiC subfamilies (COG0467, COG2874, and pfam05763) and related arCOGs (see Table S1 in the supplemental material) were extracted from the data set of complete archaeal and bacterial genomes. Genomic loci (five genes upstream and downstream from each kaiC-like gene) were retrieved for the genomic neighborhood analysis (Table S2). These loci were annotated by using PSI-BLAST and the CDD (Conserved Domain Database) collection of multiple sequence alignments, and the archaeal proteins were assigned to arCOGs (see Materials and Methods for details). Notably, members of the KaiC superfamily are present even in the archaea with the smallest genomes, such as Nanoarchaeota, and various KaiC families are expanded in many archaeal lineages, especially, in Thermococci and Thermoproteales (Table S1).
From this collection of KaiC-like protein sequences, we selected a nonredundant set of proteins that could be expected to contain at least one full-sized ATPase domain (~200 amino acid residues). This nonredundant set was used to build a dendrogram by using a combination of the FastTree method and the unweighted pair group method using average linkages (UPGMA) (Text S1; see Materials and Methods for details). The resulting tree topology was largely consistent with results of previous phylogenetic analyses (34,39).
Despite the considerable overrepresentation of bacterial compared to archaeal genomes in the database, archaeal (and cyanobacterial) proteins dominate the KaiC family, in agreement with the previous conclusion that this family originated in Archaea (42). A phylogenetic tree was built for a nonredundant subset of KaiC family members ( Fig. 2A). The tree contains 28 distinct strongly supported archaeal branches (A1 to A28) and 6 predominantly bacterial branches (B1 to B6). Bacterial sequences are mostly scattered over the tree, suggesting frequent HGT from archaea to bacteria. The large, mostly bacterial clade combining branches B2 and B3 corresponds to cyanobacterial KaiC components of the circadian clock (B3) and KaiC-like sequences (B2) including experimentally studied proteins of Rhodopseudomonas and Legionella (33, 43) ( Fig. 2A). The strongly supported (95%) B2 clade contains several archaeal proteins, in addition to bacterial ones, all from different methanogens (branches A5a and A5b), which indicates likely HGT from bacteria to archaea. In Rhodopseudomonas (branch B2), involvement of the KaiC homologues in clock-like gene expression has been demonstrated, whereas in Legionella (branch B2), these proteins are implicated in oxidative and sodium stress resistance and do not appear to be components of an oscillator. This clade is deeply nested among diverse archaeal branches, in accord with the scenario in which the ancestral components of the circadian clock were transferred from archaea to bacteria ( Fig. 2A). Proteins containing two ATPase domains and those with a single ATPase domain are interspersed in the tree, suggesting that multiple gene fusions and gene fissions occurred during the evolution of this family in Archaea. Furthermore, active and inactivated (as determined from the disruption of the Walker A and B signature motifs of the P-loop domain) ATPases are also interspersed, indicating multiple independent ATPase inactivations ( Fig. 2A; Table 2). Here, we collectively refer to all groups of the KaiC homologues with inactivated ATPase domains as iKaiC; clearly, despite the abrogation of the ATPase activity, iKaiC could perform other functions, as discussed below. Archaeal branch A9 consists of KaiC-like proteins that are well represented in both Euryarchaeota and the TACK (Thaumarchaeota, Aigarchaeota, Crenarchaeota, Korarchaeota) superphylum, and thus appear to be ancestral ( Table 2). Although the support of this branch is not very strong (44), all of these proteins belong to the same cluster, arCOG01171, and have a single ATPase domain, so two independent approaches to sequence clustering give similar results. The same considerations apply to branch A3, which includes KaiC-like proteins with two active ATPase domains. The third branch (A17) that appears to be ancestral consists of FlaH proteins, essential archaellum components (36,45). The remaining tree branches are either lineage specific or include only a few archaeal lineages ( Table 2; Table S3). Thus, this analysis supports the previous conclusions that at least three KaiC families could be represented in the LACA (11). The multiple long branches and inactivation of the ATPase domain imply frequent subfunctionalization of the KaiC family proteins, especially in Thermococci and Thermoproteaceae and to a lesser extent in Aciduliprofundum and Archaeoglobi. This evolutionary trend resulted in the appearance of numerous subfamilies of highly diverged iKaiC proteins (Table S1).
Predicted interaction partners of KaiC proteins in Archaea. Analysis of conserved gene neighborhoods (Fig. 2B) and domain fusions (Fig. 3) revealed a complex and diverse set of proteins and domains that can be predicted to interact with KaiC family members.
The three most common contextual themes involving the KaiC family are (i) type IV pilus systems and other membrane-associated complexes such as the signal recognition particle (SRP) GTPase Ffh or a FlgN-like flagellar biosynthesis/secretory pathway chaperone (20), (ii) signal-transducing and sensory proteins that are typically associated with histidine kinases in bacteria, and (iii) membrane transporters (Fig. 2B).  KaiC-Like Proteins as Archaeal Signal Transduction Hubs ® Specifically, we can confidently predict the interacting partners for two ancestral KaiC families, in addition to FlaH, for which such partners are already known. Branch A9 KaiC proteins are predicted to interact with uncharacterized proteins from arCOG00921 (COG1318, predicted DNA-binding transcriptional regulators of the GlpR family) (Fig. 2B). The data in Table S1 indicate that arCOG00921 proteins and proteins from the A9 KaiC branch are always present in the same genome and often adjacently, even in the smallest archaeal genome of Nanoarchaeum equitans (NEQ174 and NEQ534, respectively). The coincidence of retention suggests that both components are involved in the same important cellular process(es). A third component also could be linked to this system, namely, a protein of the poorly characterized DUF77 family (pfam01910/ COG0011) that is present in most archaea (arCOG04373) and appears to descend from the LACA (11). The structure of a protein from this family has been solved, revealing a ferredoxin fold, and it has been shown to form homotetramers and bind thiamine; in Thermotoga, the expression of the gene for this protein is upregulated under oxidizing conditions (46). Accordingly, it has been proposed that the protein is involved in an oxidative stress response mechanism (11). Additionally, arCOG007764 (a paralog of arCOG00921) is associated with KaiC-like ATPases of branch A24, whereas arCOG04373 is also associated with KaiC-like ATPases of branch A14, reinforcing the functional linkage of these three protein families (Fig. 2B).
Proteins with two ATPase domains, which most closely resemble the bona fide cyanobacterial KaiC protein, are typically associated with a small protein, either KaiB (branches A5a and A5b) or a member of an uncharacterized protein family (e.g., arCOG07117, arCOG03757, arCOG03758, arCOG11224, and arCOG10037) in ancestral branch A3 (Fig. 2B). The structure of one protein of this family has been solved (PDB code 2p9x), revealing a four-helix bundle fold. Structural comparison by using VAST (47) shows that the best match for this protein is the eukaryotic DEATH domain (a domain named for death, meaning its involvement in apoptosis, also often referred to as DD) with a root mean square deviation of 0.97 Å from the DD of the human RAIDD (DD-containing protein; the abbreviation is complex and is explained in reference 48) protein (PDB code 2O71) (49, 50) (Fig. S1). The DDs and related ␣-helical adapter domains are key components of eukaryotic signal transduction pathways, particularly those involved in programmed cell death (apoptosis), where these domains mediate connections between different components through homotypic interactions (i.e., different DD-related adapter domains interact with one another) (49,50). Exceptions to this association are the two-domain ATPases from the halobacteria-specific clade of branch A11 and from mostly methanomicrobial branch A1, for which no small partner protein encoded in the same locus could be identified. Bacterial branch B1 lies within an archaeal subtree that includes branches A1 to A4; several internal branches within this subtree are strongly supported (Ͼ90%) ( Fig. 2A), suggesting horizontal transfer from archaea to bacteria. The majority of the kaiC genes associated with branch B1 are located next to genes related to two-component systems, suggesting that archaeal KaiC of branch A1 could interact with the analogous components encoded in other loci in the respective archaeal genomes. DD-like domains are specifically expanded in the class Thermococci and several members of the phylum Thaumarchaeota ( Fig. S1B; Table S1). Some of them are fused to a diverged iKaiC domain, REC domain, or ferritin domain, further linking these proteins to KaiC. Moreover, genes encoding DD-like domain proteins are found in several conserved neighborhoods together with other uncharacterized genes, suggesting that additional components could be linked to the KaiC-based signal transduction network (Fig. S1B). Taking these observations into account, we predict that the DD-like domains also serve as modulators of the autophosphorylation activity of KaiC.
Single-domain KaiC-like ATPases are often encoded as doublets of paralogs, of which some are active and others are inactivated, suggesting that they might form heterodimers, recapitulating the organization of the two-domain KaiC-like ATPases (Fig. 2B).
The fusions of KaiC with other domains are also informative, showing either the same trend as that observed for the conserved neighborhoods or suggesting the involvement of KaiC-like domains in more complex signal transduction pathways (Fig. 3). Many of these fusions (e.g., to the TRASH [trafficking, resistance, and sensing of heavy metals] sensory domain, rubredoxins, and ferritins) point to an involvement in oxidative stress. Most often, we observe ferritin domains both fused to KaiC or DUF835 and found in the respective neighborhoods ( Fig. 3; Table S2). Ferritins are iron-binding proteins whose role in the oxidative stress response is well established (51).
The association with the SRP GTPase Ffh and the regulatory GTPase Srp102/FtsY suggests that KaiC-like proteins might regulate the targeting of nascent secreted or membrane proteins from the ribosome to the membrane through the SRP (44, 52).
The iKaiC domains of the DUF835 family are often found in multidomain proteins (Fig. 3). Many of these contain sensory and signal-transducing domains that have been thoroughly studied in the context of bacterial two-component signal transduction systems (53,54). This connection suggests that DUF835 proteins are involved in signal transduction pathways. Many proteins of this family are membrane associated, presumably interacting with other membrane proteins, some of which are fused to the DUF835 domain (e.g., the Na ϩ /proline symporter-like domain) (Fig. 3; Tables S1 and S2). Fusions with other regulatory and signal transduction proteins, such as AAA ATPases containing tetratricopeptide repeats and cyclases, in particular (Fig. 3), suggest that KaiC family proteins are involved in highly complex pathways, which include cross-talk with other signal transduction systems.
Finally, the previously described MEDS (methanogen/methylotroph DcmR sensory) domain (arCOG03567, pfam14417) shows a clear affinity for the KaiC family. PSI-BLAST searches initiated with any of the MEDS domain sequences against the arCOG database reveal significant sequence similarity of this domain with the members of KaiC-like arCOG01171 (E value of 4e-05 in the second iteration), although the MEDS domain is unlikely to be an active ATPase because of the lack of catalytic residues in the Walker A and B ATPase motifs. The MEDS domain has been described previously (55) both as a stand-alone domain, often encoded in the genomic neighborhoods of other components of signal transduction systems, and in a fusion with sensory histidine kinases along with other sensory domains (55). Here we also identified fusions of MEDS and DD-like domains (Fig. S1). Taken together, these observations suggest that the MEDS domain could be functionally similar to the DUF835 domain described above.
Many domains and genes linked to iKaiC proteins remain uncharacterized. There is an expansion of two protein families in Halobacteria that are associated with iKaiC of arCOG02452. One of these has been discussed previously in the context of signal transduction systems and is called HalX (arCOG02601/pfam08663) (56). The HalX domain is often fused to the REC domain and is found in the context of genes associated with a two-component signal transduction system (Fig. 2; Tables S1 and S2). The second expanded domain has not been previously described. In halobacterial genomes, it is represented by multiple paralogs that belong to five arCOGs, arCOG08928, arCOG08103, arCOG08989, arCOG09008, and arCOG08980. Among these, only arCOG08928 is often located next to an iKaiC of arCOG02452, and a few arCOG08980 members are fused to iKaiC ( Fig. 2 and 3). Both domains might function as input domains for the respective KaiC-like proteins.
Models of KaiC-based signal transduction systems. The multiple lines of evidence discussed above indicate that the KaiC family is likely a major hub of a versatile and complex archaeal signaling network that so far has largely escaped attention. Nevertheless, the available experimental data on a halobacterial circadian clock (24,57) and the recent progress in the study of the functions of FlaH in the archaellum (35,36,45) allow us to propose two models of the roles of KaiC-like proteins in signal transduction (Fig. 4). The first model is essentially identical to the circadian clock mechanism and postulates the formation of either a homohexameric ring of KaiC proteins containing two ATPase domains or heterohexamers of interacting KaiC proteins, each containing a single ATPase domain. Both domains can be active ATPases, or alternatively, one of the domains could be inactivated, such as one of the multiple DUF835 domains, which would pass the signal from an input domain to the active KaiC-like domain (Fig. 4). Each of the hexameric ATPase rings would interact with multiple partners and, as with other signal transduction systems, such partners can be roughly classified into input and output components (Fig. 4). In addition, the KaiC rings could interact with modulators of the ATPase activity, such as KaiB in the circadian clock, which might compete for binding with other output proteins.
The second model postulates interaction of a single-domain KaiC-like ATPase homohexamer directly with an output domain, similar to the potential interaction between FlaH and FlaI in the archaellum (36, 45) (Fig. 4). Many of the predicted compo-nents remain uncharacterized, and thus no specific functions can be predicted for them at this time. Furthermore, the KaiC-centered signaling systems could be interconnected with other signal transduction pathways, in particular, with two-components systems, via shared domains of input proteins (Fig. 4), and with Ras-like GTPases, either directly through the interaction with Srp102/FtsY or through Roadblock family proteins as described for both bacteria and eukaryotes (58,59). The mode of signaling apparently Below the scheme of predicted protein-protein interaction, selected input, modulator, and output components are listed inside the oval borders, which are colored according to the predicted functions of these components. Each protein family name is shown next to a circle of the same color used for this component in Fig. 2 and 3.
KaiC-Like Proteins as Archaeal Signal Transduction Hubs ® can be modified with relative ease. For example, a two-domain KaiC-like ATPase and a DD-like protein are encoded in the type IV pilus loci in Thermoproteales, whereas in Euryarchaeota, these loci contain a gene coding for a single-domain KaiC (arCOG01175) linked to a FlhG-like secretion chaperone. Thus, two distinct models could apply to the same process of regulation of type IV pilus (or archaellum) assembly in different archaea (Fig. 4).
It can be predicted that many KaiC-like proteins lack autophosphorylation activity but could bind and/or hydrolyze ATP to transduce the signal. Indeed, many of these proteins lack the pair of serine/threonine residues that are conserved among the bona fide KaiC proteins and are autophosphorylated in the circadian clock system (30). However, several archaeal KaiC subfamilies, especially the two-domain ATPases, retain this motif or at least one of the two hydroxy amino acids and could be active autokinases.
Implications for the archaeal circadian clock. Among archaea, diurnal gene expression has been demonstrated only in Halobacteria (24,57,60). It has been shown that KaiC-like proteins undergo cyclic expression, and deletion of most of them affected the expression of the others, suggesting that Halobacteria indeed might have a bona fide KaiC-based circadian mechanism (38). Similarly to cyanobacteria, Halobacteria adjust their metabolism to light conditions via rhodopsin-based proton pumps that generate a proton gradient and sensory rhodopsins that control phototaxis (61). Halobacteria encode two-domain KaiC-like ATPases (both within branch A11 in Fig. 2A), which do not group with KaiC from cyanobacteria. Furthermore, neither KaiB nor KaiA nor any potential analogue of these KaiC interactors could be identified in the genomic neighborhoods of the halobacterial KaiC-like ATPases. Moreover, there was a weak, if any, correlation between the presence of two-domain KaiC ATPases and rhodopsin-like proteins in halobacterial genomes (Table S1). Accordingly, the functions of these proteins in Halobacteria remain unclear. To the best of our knowledge, no evidence of a circadian clock in any other archaea has been reported and no rhodopsins have been identified.
A putative minimal circadian clock system consisting of KaiC from branch A5 and KaiB is present in some methanogens ( Fig. 2; Table S3). However, as in the case in Legionella, this system could be involved in regulatory pathways distinct from the circadian clock. Apart from Halobacteria, there seems to be no indication that archaea can sense light and modulate their metabolism accordingly. It thus appears unlikely that most archaea possess circadian clocks similar to those of photosynthetic bacteria. However, if indications of clock mechanisms in archaea (other than Halobacteria) were found, the best candidates would be the systems containing the two-domain KaiC-like ATPases of branch A5, which is associated with KaiB, and those of branch A3, associated with a DEATH-like domain, a potential analogue of KaiB ( Fig. 2 and 4).
Concluding remarks. The striking proliferation and diversification of the KaiC-like ATPase family in archaea imply that these proteins comprise the core of diverse, unexplored, and apparently, archaea-specific signal transduction networks. These signal transduction systems are likely involved in the regulation of membrane-associated complexes and individual proteins, such as the archaellum, type IV pili, SRP, and membrane transporters. Additionally, the KaiC-centered signal transduction machinery can be predicted to regulate a response to oxidative stress. However, it appears unlikely that archaea, apart, maybe, from Halobacteria, possess cyanobacterialtype, KaiC-centered circadian clocks. The KaiC-based signaling mechanisms appear to be ancestral in Archaea, with at least three KaiC paralogs projected to the LACA. One ancestral KaiC subfamily that includes a protein containing an HTH domain (arCOG00921) might be involved in as-yet-uncharacterized global response pathways because it is encoded even in the minimal genome of the Nanoarchaeota. The predicted KaiC-based signal transduction system appears to be interconnected with two-component signal transduction systems through iKaiC of the DUF835 family and MEDS domains that are predicted to interact with active KaiC ATPases. In contrast, we could not identify any connections between the KaiC-centered network and genes involved in the S/T kinase pathway. Additionally, inspection of the available data on archaeal phosphoproteomes yielded no indications of extensive phosphorylation of KaiC pathway-related genes (62)(63)(64). Thus, the KaiC network appears to be largely disjointed from the S/T kinase-mediated regulatory pathways in Archaea. The phylogenomic analysis reported here can produce only crude models of archaeal signal transduction. Nevertheless, these observations expose multiple experimental directions that can be expected to shed light on key aspects of archaeal cell biology.

MATERIALS AND METHODS
Archaeal and bacterial complete genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/) in March 2016. Altogether, the database includes 4,961 completely sequenced and assembled genomes. These genomes were assigned to COGs and Pfam families by using the PSI-BLAST program with an E-value cutoff of 1e-4 and low-complexity filtering turned off against a collection of multiple sequence alignments (profiles) from the CDD database (65) derived from COGs, Pfam, and CDD itself. The same approach was used to assign archaeal proteins to arCOGs as described previously (66).
All proteins that were assigned to any of the three groups (COG0467/pfam06745, COG2874, pfam05763) or to arCOGs associated with the KaiC family were retrieved. Genomic loci containing five genes upstream and downstream of all kaiC-like genes were extracted for neighborhood analysis. KaiC-like sequences were clustered by using BLASTCLUST (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html) with a length coverage of 90% and a sequence identity threshold of 90% to obtain a nonredundant set of sequences. Among those, readily alignable groups of predominantly active ATPase sequences were selected for phylogenetic analysis (several inactivated ATPases aligned poorly and were not included in this analysis; also, protein fragments in the nonredundant set were discarded). The final set used for tree reconstruction included 1,011 sequences. Tree reconstruction was performed by two approaches, (i) a combination of FastTree and UPGMA for full-length sequences and (ii) the default FastTree method for the N-terminal ATPase domain only. For the first approach, initial sequence clusters were obtained by using UCLUST (67) with a sequence similarity threshold of 0.5; the sequences were aligned within clusters by using MUSCLE (68). Cluster-to-cluster similarity scores were then obtained by using HHsearch (69) (including trivial clusters consisting of a single sequence each). A UPGMA dendrogram was constructed from the pairwise similarity scores. Highly similar clusters (pairwise-score to self-score ratio, Ͼ0.1) were aligned with each other by using HHALIGN (69). This procedure was repeated iteratively. At the last step, sequence-based trees were reconstructed from the cluster alignments by using FastTree (70) as described below and rooted by midpoint; these trees were grafted onto the tips of the profile similarity-based UPGMA dendrogram. Sites with gap character fraction values of Ͼ0.5 and homogeneity values of Ͻ0.1 were removed from the alignment (71). In both cases, the FastTree program (70) was executed with the WAG evolutionary model and the discrete gamma model with 20 rate categories. The same program was used to compute SH (Shimodaira-Hasegawa)-like node support values.
To identify remote sequence similarity, HHpred with default parameters (69) and CD search (72) with an E value cutoff of 10 and composition-based statistics adjustment turned off were used. In addition, web-based, manually curated PSI-BLAST searches were run with and without the composition-based statistics adjustment and with low-complexity filtering turned off. Inclusion E-value thresholds of 0.1 to 1e-8, depending on sequence length and content, were used, and some searches were run against the archaeal subset of the NCBI nonredundant protein database. The VAST program (47) was used with default parameters for structural comparison.

ACKNOWLEDGMENTS
We thank Thomas Santangelo of Colorado State University and Sonja Albers of the University of Freiburg for helpful discussions and critical reading of the manuscript. We are also grateful to Yuri Wolf of the National Center for Biotechnology Information for technical assistance and for in-house scripts that we used during this project.
Our research is supported by the NIH Intramural Research Program at the National Library of Medicine, U.S. Department of Health and Human Services.