Pardo Llorente, Leandro and Menéndez Calleja, María Luisa and Pardo Llorente, María del Carmen and Zografos, Konstantinos (2011) Testing the order of Markov dependence in DNA sequences. Methodology and computing in applied probability, 13 (1). pp. 59-74. ISSN 1387-5841
Restricted to Repository staff only until 31 December 2020.
DNA or protein sequences are usually modeled as probabilistic phenomena. The simplest model is created on the assumption that the nucleotides at the various sites are independently distributed. Usually the type of nucleotide at some site depends on the type at another site and therefore the DNA sequence is modeled as a Markov chain of random variables taking on the values A, G, C and T corresponding to the four nucleotides. First order or higher order Markov models provide better fit to a DNA sequence. Based on this remark, the aim of this paper is to present and study a family of test statistics for testing order Markov dependence in DNA sequences. This new family includes as a particular case the classical likelihood ratio test. A simulation study is presented in order to find test statistics, in this family, with a better behaviour than the likelihood ratio test.
|Uncontrolled Keywords:||DNA sequence; Markov dependence; Likelihood ratio test; Phi-divergence test statistics; Divergence; Chain|
|Subjects:||Sciences > Mathematics > Applied statistics|
Avery PJ, Henderson DA (1999) Fitting Markov chain models to discrete state series such as DNA sequences. Appl Stat 48:53–61
Bejerano G, Friedman N, Tishhy N (2004) Efficient exact p-value computation for small sample, sparse and surprising categorical data. J Comput Biol 11:867–886
Bell GI, Sánchez-Pescador R, Laybourn PJ, Najarian RC (1983) Exon duplication and divergence in the human preproglucagon gene. Nature 304:368–371
Billingsley P (1961a) Statistical methods in Markov chains. Ann Math Stat 32:13–39
Billingsley P (1961b) Statistical inference for Markov processes. The University of Chicago Press, Chicago
Ewens WJ, Grant GR (2005) Statistical methods in bioinformatics (2nd edn). Springer, New York.
Hoel PG (1954) A test for Markov chains. Biometrika 14:430–433
Menéndez ML, Pardo JA, Pardo L (2001) Csiszar’s ϕ-divergences for testing the order in a Markov chain. Stat Pap 42:313–328
Menéndez ML, Pardo JA, Pardo L, Zografos K (2006) On tests of independence based on minimum φ-divergence estimator with constraints: an application to modeling DNA. Comput Stat Data Anal 51(2):1100–1118
Patel NR (2003) An exact test for homogeneity of a Markov chain. www.cytel.com
Pardo L (2006) Statistical inference based on divergence measures. Chapman & Hall/CRC, New York
Pardo L,Morales D, Salicrú M, MenéndezML (1993) The ϕ-divergence statistic in bivariate multinomial populations including stratification. Metrika 40:223–235
Read TRC, Cressie NAC (1988) Goodness-of-fit statistics for discrete multivariate data. Springer, New York
Reinert G, Schbath S, Waterman MS (2000) Probabilistic and statistical properties of words: and overview. J Comput Biol 7:1–46
Zografos K (1993) Asymptotic properties of φ-divergence statistic and applications in contingency tables. Int J Math Stat Sci 2:5–21
|Deposited On:||05 Dec 2012 09:20|
|Last Modified:||07 Feb 2014 09:45|
Repository Staff Only: item control page