TABLE 2

Input features for the machine learning classifier

FeatureDefinition
HitsNo. of insertion sites within the ORF
ReadsNo. of reads within the ORF
Hits in promoterNo. of hits within 100 bp upstream of ORF start codon
ORF lengthTotal length of ORF coding sequence (intron-free)
Insertion indexaNo. of hits in the ORF divided by ORF length
Noncoding windowaNoncoding sequence (including introns) within 10 kb up- and downstream of ORF
Neighborhood index (NI)Insertion index normalized to the noncoding window (hits divided by length)
Hit-free interval (HFI)Length of longest insertion-free interval divided by ORF length
  • a These features were input indirectly to calculate NI and HFI.