Study of the distribution and behaviour of the "0" values in large omic data arrays

Fidalgo Gómez, Helena

Publication:
Study of the distribution and behaviour of the "0" values in large omic data arrays

Files

e-print TFM_Helena Fidalgo _MBioEstad.pdf (5.03 MB)

Publication Date

2020

Authors

Fidalgo Gómez, Helena

Advisors (or tutors)

Malats Riera, Nuria

Pérez Pérez, María Teresa

Alonso Guirado, María dolores

Publisher

Facultad de Estudios Estadísticos

Citations

Exportar

Abstract

Existen evidencias de que muchas enfermedades no están determinadas sólo por alteraciones genéticas. Un claro ejemplo es el cáncer que engloba muchas enfermedades producidas por la interacción de factores genéticos y no genéticos durante toda la vida. Entre los factores no-genéticos se encuentran la forma en que los seres humanos viven e interactúan con el medio ambiente y el microbioma; ambas exposiciones pueden ser caracterizadas con datos ómicos. Las tecnologías ómicas representan una reciente área de estudio que engloba diversas disciplinas biológicas. Las tecnologías aplicadas a las ómicas permiten estudiar, a nivel molecular los diferentes elementos que componen los sistemas biológicos. Hoy en día, el foco se encuentra en una nueva área: la microbioma, puesto que se han encontrado diversas asociaciones entre ciertos microorganismos y enfermedades. El reto principal en el análisis de datos de microbioma es el escaso número en los datos de conteo de microbioma, los cuales son de gran dimensión y contienen una gran proporción de ceros. En este proyecto se pretende mostrar diferentes alternativas para el análisis de datos de conteo que se caracterizan por una clara sobre dispersión y exceso de ceros. Aplicando modelos de regresión como los modelos de inflación de cero o los modelos Hurdle pude establecer qué tipo de ceros se encuentran en la base de datos. Estos modelos y sus correspondientes distribuciones están sometidos a diferentes criterios de selección con el objetivo de establecer cuál es el modelo que mejor se ajuste a los datos en función del porcentaje de ceros que presente. Ello me ha permitido definir relaciones entre diferentes microorganismos y expresiones genéticas, estadios tumorales, subtipos inmunes, género, IMC, ...
There is evidence that many diseases are not only determined by gene alterations. A clear example is cancer, encompassing several complex diseases where both genetic and non-genetic factors interact over the lifespan; the latter including environment exposures and the microbiome that can be assessed using omics approaches. Omics is a recent area of study including several biological disciplines. The technologies applied to omic sciences allow the study, at a molecular level, of the different elements that make up biological systems. Recently, biomedical science focusses on a new area: microbiome, where various associations between certain microorganisms and diseases have been found. One of the challenges in modelling cancer risk is the analysis of microbiome data: microbe counts are sparse and the data are high dimension and contain a large proportion of zeros. This project aims to show different alternatives for the analysis of counting data that are characterized by a clear over- dispersion and excess of zeros. Moving ahead from classical linear models, there are regression models, such as Zero Inflated models or "Hurdle" models, I was able to establish what kind of zeros are in the database. These models and their corresponding distributions are subjected to different selection criteria with the purpose of establishing which is the model that best fits the data, depending on the percentage of zeros that it presents. By applying these approaches, I could appropriately define relationships between different microorganisms and gene expressions, tumour stages, immune subtypes, gender, BMI, ...

Publication:
Study of the distribution and behaviour of the "0" values in large omic data arrays

Files

Official URL

Full text at PDC

Publication Date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

Research Projects

Organizational Units

Journal Issue

Abstract

Description

UCM subjects

Unesco subjects

Keywords

Citation

URI

Collections

Publication: Study of the distribution and behaviour of the "0" values in large omic data arrays

Files

Official URL

Full text at PDC

Publication Date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

Research Projects

Organizational Units

Journal Issue

Abstract

Description

UCM subjects

Unesco subjects

Keywords

Citation

URI

Collections

Publication:
Study of the distribution and behaviour of the "0" values in large omic data arrays