Universidad Complutense de Madrid
E-Prints Complutense

Biclustering of gene expression data by non-smooth non-negative matrix factorization



Downloads per month over past year

Carmona Saez, P. and Pascual Marqui, R. D. and Tirado Fernández, Francisco and Carazo, J. M. and Pascual Montano, Alberto (2006) Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics, 7 . ISSN 1471-2105

[thumbnail of tirado39libre+CC.pdf]
Creative Commons Attribution.


Official URL: http://dx.doi.org/10.1186/1471-2105-7-78


Background: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. Results: In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. Conclusion: The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.

Item Type:Article
Additional Information:

© 2006 Carmona-Saez et al; licensee BioMed Central Ltd.
This work has been supported by the Spanish grants GR/SAL/0653/2004, CICYT BFU2004-00217/BMC, GEN2003-20235-c05-05, TIN2005-5619, PR27/05-13964-BSCH and a collaborative grant between the Spanish Research Council and the National Research Council of Canada (CSIC050402040003). The authors also thank the KEY Foundation for Brain-Mind Research in Zurich for partial economical support of this work. P.C.S. is the recipient of a fellowship from Comunidad de Madrid (CAM). A.P.M.
acknowledges the support of the Spanish Ramón y Cajal program.

Uncontrolled Keywords:Gastrointestinal stromal tumors; Self-organizing maps; Normal human tissues; Kinase-c-theta; Microarray data; Identification; Discovery; Patterns.
Subjects:Sciences > Computer science
ID Code:33118
Deposited On:30 Sep 2015 15:04
Last Modified:11 Mar 2016 15:35

Origin of downloads

Repository Staff Only: item control page