Performance of the random forest model for predicting environmental contamination by uranium or nitrate in 69 wells at the OR-IFRC site using microbial functional genes as predictors

ContaminantPredictoraOOB error
rate (%)
No. of wells predicted/no. of wells defined
Background wellsbContaminated wellsc
UraniumAll S cycling and metal-related genes28.9947/472/22
All dsrA, cytochrome, and hydrogenase genes24.6447/475/22
All dsrA genes24.6447/475/22
All cytochrome genes26.0946/475/22
All hydrogenase genes28.9941/478/22
Key dsrA, cytochrome, and hydrogenase genes27.5445/475/22
Key dsrA genes24.6445/477/22
Key cytochrome genes39.1338/474/22
Key hydrogenase genes42.0333/477/22
AUC-RF selection11.5947/4714/22
NitrateAll N cycling genes36.2339/445/25
All nifH, amoA, narG, nasA, and napA genes34.7840/445/25
All nifH genes33.3341/445/25
All amoA genes27.5441/449/25
All narG genes36.2340/444/25
All nasA genes36.2337/447/25
All napA genes34.7841/444/25
Key nifH, amoA, narG, nasA, and napA genes30.4340/448/25
Key nifH genes27.5441/449/25
Key amoA genes28.9939/4410/25
Key narG genes37.6837/446/25
Key nasA genes40.5832/449/25
Key napA genes40.5832/449/25
AUC-RF selection15.9442/4416/25
  • a Key functional genes detected from each family are listed in Tables S3 and S4 in the supplemental material.

  • b In background wells, the concentrations of uranium or nitrate were 30 µg/liter or below or 10 mg/liter or below, respectively.

  • c In contaminated wells, the concentrations of uranium or nitrate were higher than 30 µg/liter or 10 mg/liter, respectively.