Publication:
Fitting DNA sequences through log-linear modelling with linear constraints

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2011
Authors
Martín Apaolaza, Níriam
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Taylor & Francis
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
For some discrete state series, such as DNA sequences, it can often be postulated that its probabilistic behaviour is given by a Markov chain. For making the decision on whether or not an uncharacterized piece of DNA is part of the coding region of a gene, under the Markovian assumption, there are two statistical tools that are essential to be considered: the hypothesis testing of the order in a Markov chain and the estimators of transition probabilities. In order to improve the traditional statistical procedures for both of them when stationarity assumption can be considered, a new version for understanding the homogeneity hypothesis is proposed so that log-linear modelling is applied for conditional independence jointly with homogeneity restrictions on the expected means of transition counts in the sequence. In addition we can consider a variety of test-statistics and estimators by using phi-divergence measures. As special case of them the well-known likelihood ratio test-statistics and maximum-likelihood estimators are obtained.
Description
Unesco subjects
Keywords
Citation
M. Borodovsky and J. McIninch, GeneMark: Gene prediction of both DNA strands, Comput. Chem. 17 (1993), pp. 123–133. J. Besemer and M. Borodovsky, GeneMark: Web software for gene finding in prokaryotes, eukaryotes and viruses, Nucleic Acids Res. 33 (2005), pp. 451–454. M.S. Bartlett, The frequency goodness of fit for probability chains, Proc. Camb. Philos. Soc. 47 (1951), pp. 86–95. P.G. Hoel, A test for Markov chains, Biometrika 41 (1954), pp. 430–433. T.W. Anderson and L. Goodman, Statistical inference about Markov chains, Ann. Math. Stat. 28 (1958), pp. 89–109. P. Billingsley, Statistical Inference for Markov Processes, The University of Chicago Press, Chicago, 1961. M.L. Menéndez, M.C. Pardo, L. Pardo, and K. Zografos, Testing the order of Markov dependence in DNA sequences, Methodol. Comput. Appl. Probab. (2010). Available at http://0-dx.doi.org.cisne.sim.ucm.es/10.1007/s11009-008-9107-1, DOI:10.1007/s11009-008-9107-1. M.L. Menéndez, J.A. Pardo, L. Pardo, and K. Zografos, On test of independence based on minimum φ -divergence estimator with constraints: An application to modeling DNA, Comput. Stat. Data Anal. 51 (2006), pp. 1100–1118. P.J. Avery and D.A. Henderson, Fitting Markov chain models to discrete state series such as DNA sequences, J. R. Stat. Soc. Ser. C: Appl. Stat. 48 (1999), pp. 53–61. M. Haber and M.B. Brown, Maximum likelihood methods for log-linear models when expected frequencies are subject to linear constraints, J. Am. Stat. Assoc. 81 (1986), pp. 477–482. J.B. Lang and A. Agresti, Simultaneously modelling the joint and marginal distributions of multivariate categorical responses, J. Am. Stat. Assoc. 89 (1994), pp. 625–632. C.C. Balagtas, M.P. Becker, and J.B. Lang, Marginal modelling of categorical data from crossover experiments, J. R. Stat. Soc. Ser. C: Appl. Stat. 44 (1995), pp. 63–77. J.B. Lang, Maximum likelihood methods for a generalized class of loglinear models, Ann. Stat. 24 (1996), pp. 726–752. N. Martin and L. Pardo, New families of estimators and test statistics in log-linear models, J. Multivariate Anal. 99 (2008), pp. 1590–1609. R. Christensen, Log-Linear Models and Logistic Regression, Springer-Verlag, New York, 1997. A.C. Davison, Statistical Models, Cambridge University Press, New York, 1993. Y.M. Bishop, S.E. Fienberg, and P.W. Holland, Discrete Multivariate Analysis: Theory and Practice (reprint of the 1975 original), The MIT Press, Cambridge, 1995. L. Pardo, Statistical Inference Based on Divergence Measures, Statistics: Textbooks and Monographs, Chapman & Hall/CRC, Boca Raton, 2006. T.R.C. Read and N. Cressie, Goodness of Fit Statistics for Discrete Multivariate Data, Sciences Verlag Publishers, New York, 1988. N. Martin and L. Pardo, Phi-divergence tests statistics in multinomial sampling for hierarchical sequences of log-linear models with linear constraints, Monografias Seminario Matemático Garcia Galdeano, Ninth Zaragoza-Pau Conference on Applied Mathematics and Statistics Vol. 31, Universidad de Zaragoza, 2006, pp. 301–308. Available at http://www.unizar.es/galdeano/actas\_pau/PDFIX/MarPar05.pdf.
Collections