Bhasin, Manoj and Reinherz, Ellis L and Reche, Pedro A Recognition and classification of histones using support vector machine. Journal of computational biology : a journal of computational molecular cell biology, 13 (1). pp. 102-12. ISSN 1066-5277
Histones are DNA-binding proteins found in the chromatin of all eukaryotic cells. They are highly conserved and can be grouped into five major classes: H1/H5, H2A, H2B, H3, and H4. Two copies of H2A, H2B, H3, and H4 bind to about 160 base pairs of DNA forming the core of the nucleosome (the repeating structure of chromatin) and H1/H5 bind to its DNA linker sequence. Overall, histones have a high arginine/lysine content that is optimal for interaction with DNA. This sequence bias can make the classification of histones difficult using standard sequence similarity approaches. Therefore, in this paper, we applied support vector machine (SVM) to recognize and classify histones on the basis of their amino acid and dipeptide composition. On evaluation through a five-fold cross-validation, the SVM-based method was able to distinguish histones from nonhistones (nuclear proteins) with an accuracy around 98%. Similarly, we obtained an overall >95% accuracy in discriminating the five classes of histones through the application of 1-versus-rest (1-v-r) SVM. Finally, we have applied this SVM-based method to the detection of histones from whole proteomes and found a comparable sensitivity to that accomplished by hidden Markov motifs (HMM) profiles.
|Subjects:||Sciences > Computer science > Bioinformatics|
Medical sciences > Biology > Molecular biology
|Deposited On:||06 Aug 2009 10:43|
|Last Modified:||30 Sep 2009 12:25|
Repository Staff Only: item control page