Complutense University Library

Testing the order of Markov dependence in DNA sequences


Pardo Llorente, Leandro and Menéndez Calleja, María Luisa and Pardo Llorente, María del Carmen and Zografos, Konstantinos (2011) Testing the order of Markov dependence in DNA sequences. Methodology and computing in applied probability, 13 (1). pp. 59-74. ISSN 1387-5841

[img] PDF
Restringido a Repository staff only hasta 31 December 2020.


Official URL:


DNA or protein sequences are usually modeled as probabilistic phenomena. The simplest model is created on the assumption that the nucleotides at the various sites are independently distributed. Usually the type of nucleotide at some site depends on the type at another site and therefore the DNA sequence is modeled as a Markov chain of random variables taking on the values A, G, C and T corresponding to the four nucleotides. First order or higher order Markov models provide better fit to a DNA sequence. Based on this remark, the aim of this paper is to present and study a family of test statistics for testing order Markov dependence in DNA sequences. This new family includes as a particular case the classical likelihood ratio test. A simulation study is presented in order to find test statistics, in this family, with a better behaviour than the likelihood ratio test.

Item Type:Article
Uncontrolled Keywords:DNA sequence; Markov dependence; Likelihood ratio test; Phi-divergence test statistics; Divergence; Chain
Subjects:Sciences > Mathematics > Applied statistics
ID Code:17330

Avery PJ, Henderson DA (1999) Fitting Markov chain models to discrete state series such as DNA sequences. Appl Stat 48:53–61

Bejerano G, Friedman N, Tishhy N (2004) Efficient exact p-value computation for small sample, sparse and surprising categorical data. J Comput Biol 11:867–886

Bell GI, Sánchez-Pescador R, Laybourn PJ, Najarian RC (1983) Exon duplication and divergence in the human preproglucagon gene. Nature 304:368–371

Billingsley P (1961a) Statistical methods in Markov chains. Ann Math Stat 32:13–39

Billingsley P (1961b) Statistical inference for Markov processes. The University of Chicago Press, Chicago

Ewens WJ, Grant GR (2005) Statistical methods in bioinformatics (2nd edn). Springer, New York.

Hoel PG (1954) A test for Markov chains. Biometrika 14:430–433

Menéndez ML, Pardo JA, Pardo L (2001) Csiszar’s ϕ-divergences for testing the order in a Markov chain. Stat Pap 42:313–328

Menéndez ML, Pardo JA, Pardo L, Zografos K (2006) On tests of independence based on minimum φ-divergence estimator with constraints: an application to modeling DNA. Comput Stat Data Anal 51(2):1100–1118

Patel NR (2003) An exact test for homogeneity of a Markov chain.

Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, New York

Pardo L,Morales D, Salicrú M, MenéndezML (1993) The ϕ-divergence statistic in bivariate multinomial populations including stratification. Metrika 40:223–235

Read TRC, Cressie NAC (1988) Goodness-of-fit statistics for discrete multivariate data. Springer, New York

Reinert G, Schbath S, Waterman MS (2000) Probabilistic and statistical properties of words: and overview. J Comput Biol 7:1–46

Zografos K (1993) Asymptotic properties of φ-divergence statistic and applications in contingency tables. Int J Math Stat Sci 2:5–21

Deposited On:05 Dec 2012 09:20
Last Modified:07 Feb 2014 09:45

Repository Staff Only: item control page