Structure-Function Analysis of the Bifunctional CcsBA Heme Exporter and Cytochrome c Synthetase

The movement or trafficking of heme is critical for cellular functions (e.g., oxygen transport and energy production); however, intracellular heme is tightly regulated due to its inherent cytotoxicity. These factors, combined with the transient nature of transport, have resulted in a lack of direct knowledge on the mechanisms of heme binding and trafficking. Here, we used the cytochrome c biogenesis system II pathway as a model to study heme trafficking. System II is composed of two integral membrane proteins (CcsBA) which function to transport heme across the membrane and stereospecifically position it for covalent attachment to apocytochrome c. We mapped two heme binding domains in CcsBA and suggest a path for heme trafficking. These data, in combination with metagenomic coevolution data, are used to determine a structural model of CcsBA, leading to increased understanding of the mechanisms for heme transport and the cytochrome c synthetase function of CcsBA.


Determination of heme redox potentials
Redox potentials were determined by a modified Massey method (5-7) as described in (8) with the following modifications: Samples were buffer exchanged to remove glutathione. Redox titrations were performed in 20 mM Tris pH8, 100 mM NaCl, 0.02% DDM and pH typically increased to pH8.3 during the reaction. The absorbance change of the heme Soret was monitored at 426 nm and reduction of the reference dye Nile Blue at 630 nm.

CcsBA Modeling Sequence Construction and GREMLIN Analysis
The sequence of naturally fused H. hepaticus CcsBA was used for modeling. Jackhmmer analysis resulted in a low number of homologous sequences per length, or Seq/Len, throughout the large CcsB periplasmic region (aa 98-633), thus 36 continuous residues with the greatest sequence conservation were chosen to link TM3 and TM4.
As described by Kamisetty et al., Seq/Len values of 5.0 and above directly correlate to optimal GREMLIN performance, generating more accurate coevolution-based distance constraints which help improve the accuracy of modeled structures (9). As a starting point, replacing the large CcsB periplasmic region with the 36 residue periplasmic region linker increased the overall Seq/Len value from 0.375 to 1.432 using the GREMLIN "monomer protocol" (9). The final sequence used for this modeling is comprised of residues 1-97 (TMs 1-3), 286-321 (36 residue periplasmic region linker), 634-935 (TMs 4-10) of the H. hepaticus fused CcsBA sequence with no gaps in the final sequence. Note that CcsBA exists in nature as a fused orf (as in H. hepaticus), or more often as two separate genes, ccsB and ccsA (10). Because the GREMLIN monomer protocol appeared to exclude some separate CcsB and CcsA sequences from coevolution analysis due to its coverage filter, the query sequence was split into CcsB (residues 1-97, 286-321, 634-658) and CcsA (residues 658-935) to be input to the GREMLIN "complex protocol" (9,11). Using the complex protocol further improved the overall Seq/Len value from 1.432 to 4.84. Because of the low number of homologous sequences per length for the 36 residue periplasmic linker, co-evolved residues and their constraints from this linker were not used during modeling.

Fragment Generation
The Robetta webserver was used to generate 3 and 9 amino acid structural fragments and the PSIPRED secondary structure prediction (12).

Trans-Membrane Domain Prediction
The OCTOPUS webserver was used to generate a trans-membrane region prediction specifying which residues would be located within the membrane (13) .The OCTOPUS predictions similarly matched both the PSIPRED predictions and the experimental topology determined by prior experiments (1,14,15) with the exception of the hydrophobic patches.

Model Building
For the initial global sampling, the Rosetta ab initio procedure in combination with GREMLIN constraints was used as described in (16) for trans-membrane proteins. An additional bounded constraint was used to restrain the distance between the beta carbons of the two TM-His residues (H83 and H858) to a range of 10 to 12 angstroms, typical for bis-histidine heme proteins. These constraints were used during both the coarse-grained sampling and full-atom refinement stages. 20,000 ab initio models were generated. The top ten scoring models by sum of the Rosetta score and the constraint energy were compared for convergence. These models converged over substructures, namely TMs 5, 6, 7, 8, 9, and the hydrophobic patches (overall TM-score > 0.5) (17,18). Following a general procedure as outlined in (16) the top five models were recombined to a pool of 1,000 structures using the Rosetta hybridization protocol (19).
The top ten models were scanned for placement of local structures in the correct relative topology determined by the OCTOPUS and PSIPRED predictions and the previously mentioned experimentally determined topology. Models which fulfilled this requirement were used as the template structures for the next round of hybridization.
Iterative refinement by hybridization (20) guided by topology was performed until all top ten scoring models converged (TM-score > 0.7) and accurately reflected correct topology. The top scoring model was used for further modeling with heme.

Modeling with Heme
A procedure previously used to model the CcmC periplasmic heme-binding site was adapted to model the homologous periplasmic heme-binding site of CcsBA (3). The following loops were removed to prevent bias from the starting structure: residues 738-762, 828-843, and 892-907. These loops respectively correspond to: P-His loop 1 (H761), WWD domain, and P-His loop 2 (H897). HMY (heme-like molecule), with the vinyl and propionates replaced by methyl groups, was generated. The Rosetta hybridization protocol (19) was used to model the removed loops around HMY. During coarse-grained sampling, a bounded constraint between 10 to 12 angstroms was used to restrain the distance between the beta carbons of the two P-His residues. During fullatom refinement, harmonic constraints were used. These include distance constraints between the NE2 atoms of the P-His and FE atom of HMY (mean: 2 angstroms, stdev 0.1), angle constraints between N1-FE-NE2, N2-FE-NE2, N3-FE-NE2, and N4-FE-NE2 (mean: 90 degrees, stdev: 10), and a dihedral constraint for each P-His between CG-CD2-NE2-FE (mean: 180 degrees, stdev: 10). 2,000 models were generated. Heme placement converged in the top ten scoring models (TM-score > 0.7). These models were then screened for correct orientation of the HMY: propionates facing down, 2-vinyl group exposed, and 4-vinyl interacting with W828 and W837. The top scoring model that satisfied this condition was selected for further modeling with the complete heme. Residues 828-843 were removed from the structure to prevent bias from the starting model. Ambiguous sigmoidal constraints (mean: 5 angstroms, slope: 4) between all side-chain carbons of W828, W833, W837, and W839 and the aromatic carbons of heme were used during both coarse-grained and full-atom stages of the Rosetta hybridization protocol. This was done to bias orientation of the tryptophan residues to maximize aromatic ring stacking between the residues and heme. In addition to the sigmoidal, harmonic, and bounded constraints described before, additional bounded constraints were added between K842 and propionate carbons (0-4 angstroms), as well as between W828, W837 and the 4-vinyl carbons (0-4 angstroms).
Further bounded constraints were added between W833 and N4 pyrrole and between W839 and N3 pyrrole (0-6 angstroms). 1,500 models were generated. The top ten models converged (TM-score > 0.7) and the top scoring model was selected for further modeling of heme into the TM heme-binding site.
An additional full heme was loaded into this top scoring model and harmonic constraints that were used for P-His modeling (H761 and H897) were added between the TM-His residues (H83 and H858) and this new heme molecule. The TM-His bounded constraint as described above was retained to restrain the residues together during coarsegrained modeling and full-atom refinement while the harmonic constraints were used only during full-atom refinement. 1,500 models were generated using Rosetta hybridization (19). The top ten models converged (TM-score > 0.7) and the top scoring model was selected to be shown in this paper.

Modeling the Large Periplasmic Region
For illustrative purposes only, the large CcsB periplasmic region (aa 98-633) was modeled independently from the remainder of CcsBA. The sequence corresponding to this region (aa 98-633) was input into both the GREMLIN and Robetta webservers to generate inter-residue distance constraints, structural fragments, and the PSIPRED secondary structure predictions. The Rosetta ab initio procedure in combination with GREMLIN constraints was used as described in (16). 20,000 models were generated.
The top ten models did not converge (TM-score < 0.3). The top scoring model was selected to replace the previously mentioned 36 residue periplasmic linker in the CcsBA structure. Because of the low number of homologous sequences per length as determined by Jackhmmer analysis and the large size of the region (> 500aa), the models are not expected to be accurate at the level of converged structures.