Publication:
Estudio de la percepción pública de la vacuna contra la COVID-19 mediante técnicas de PLN y de aprendizaje automático

Loading...
Thumbnail Image
Official URL
Full text at PDC
Publication Date
2021-07
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
La pandemia de COVID-19 causada por el nuevo coronavirus SARS-CoV-2 ha tenido un impacto significativo en la sociedad, tanto por los graves efectos sanitarios y económicos como por los efectos de las medidas sanitarias para evitar su propagación. Gracias a las técnicas de PLN se ha podido investigar las actitudes hacia la vacunación, siendo particularmente oportuno en estos momentos ante la llegada de las vacunas contra la COVID-19. Este trabajo tiene una doble finalidad, por un lado estudiar la percepción hacia la vacunación contra la COVID-19, mediante técnicas de Procesamiento de Lenguaje Natural y por otro, la construcción de un clasificador de sentimientos interpretable mediante técnicas de Aprendizaje Automático. Para ello se uitlizaron 4.000.000 de tweets relacionados con la vacunación en el periodo comprendido entre el 15 de noviembre de 2020 y el 16 de diciembre de 2020 como conjunto de datos. El periodo de tiempo seleccionado es esencial porque durante este tiempo se publicaron los primeros resultados de las vacunas contra la COVID-19, como Pfizer y Moderna, surgieron un debate público. El análisis de la percepción sugiere que hay un número significativo de tweets negativos que pueden poner en peligro el objetivo de alcanzar la inmunidad de rebaño. En cuanto a los resultados del clasificador de sentimientos multiclase se obtuvo un 92% de ROC-AUC con el algoritmo LinearSVC.
Description
Keywords
Citation
[Fahm20] N. Fahmi, “Artificial Intelligence, Machine Learning, and Deep Learning — What the Difference?” 2020. [KKKS17] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural Language Processing: State of the Art, Current Trends and Challenges,” arXiv, no. Figure 1, 2017. [SSAA19] Y. A. Solangi, Z. A. Solangi, S. Aarain, A. Abro, G. A. Mallah, and A. Shah, “Review on Natural Language Processing (NLP) and Its Toolkits for Opinion Mining and Sentiment Analysis,” 2018 IEEE 5th Int. Conf. Eng. Technol. Appl. Sci. ICETAS 2018, pp. 1–4, 2019. [DCLT19] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019. [Laca14] G. De La Calle Velasco, “Modelo Basado En Técnicas de Procesamiento de Lenguaje Natural Para Extraer y Anotar Información de Publicaciones Científicas,” Tésis Doctoral, Universidad Politécnica de Madrid). Recuperado de http://oa~…, 2014. [Chom14] N. Chomsky, Aspects of the Theory of Syntax, vol. 11. MIT press, 2014. [Sas20] SAS, “Procesamiento Del Lenguaje Natural.” 2020. [Orei20] OREILLY, “Chapter 4. Preparing Textual Data for Statistics and Machine Learning.” 2020. [Pai20] A. Pai, “What Is Tokenization in NLP? Here’s All You Need To Know.” 2020. [Gane19] K. Ganesan, “All You Need to Know about Text Preprocessing for NLP and Machine Learning.” 2019. [KCTH20] Y. Kang, Z. Cai, C. W. Tan, Q. Huang, and H. Liu, “Natural Language Processing (NLP) in Management Research: A Literature Review,” J. Manag. Anal., vol. 7, no. 2, pp. 139–172, 2020. [KGVI15] S. Kannan et al., “Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015. [BaLl00] V. Balakrishnan and E. Lloyd-yemoh, “Rp030_I3007,” pp. 174–179. [WDSC19] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” arXiv, pp. 38–45, 2019. [LiLi12] B. Liu and B. Liu, Sentiment Analysis and Opinion Mining Opinion Mining. 2012. [YaSZ17] A. Yadollahi, A. G. Shahraki, and O. R. Zaiane, “Current State of Text Sentiment Analysis from Opinion to Emotion Mining,” ACM Comput. Surv., vol. 50, no. 2, 2017. 91 AgBM09] A. Agarwal, F. Biadsy, and K. R. McKeown, “Contextual Phrase-Level Polarity Analysis Using Lexical Affect Scoring and Syntactic N-Grams,” EACL 2009 - 12th Conf. Eur. Chapter Assoc. Comput. Linguist. Proc., no. April, pp. 24–32, 2009. [Katr19] A. Katrekar, “An Introduction to Sentiment Analysis.” 2019. [AZZA20] A. H. Alamoodi et al., “Sentiment Analysis and Its Applications in Fighting COVID-19 and Infectious Diseases: A Systematic Review,” Expert Syst. Appl., p. 114155, 2020. [ChHZ15] W. Chung, S. He, and D. Zeng, “EMood: Modeling Emotion for Social Media Analytics on Ebola Disease Outbreak,” 2015 Int. Conf. Inf. Syst. Explor. Inf. Front. ICIS 2015, vol. 1976, no. March 2014, pp. 1–10, 2015. [Prak19] N. Prakash, “A Comparative Study of Lexicon Based and Machine Learning Based Classifications in Sentiment Analysis,” Int. J. Data Min. Tech. Appl., no. December, pp. 8–13, 2019. [Lisi69] V. A. Lisichkin, “The Process of Making Forecasts,” Technol. Forecast., vol. 1, no. 1, pp. 97–104, 1969. [BaES10] S. Baccianella, A. Esuli, and F. Sebastiani, “SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining,” Proc. 7th Int. Conf. Lang. Resour. Eval. Lr. 2010, vol. 0, pp. 2200–2204, 2010. [Gyal20] O. G.Yalçın, “Sentiment Analysis in 10 Minutes with Rule-Based VADER and NLTK.” 2020. [HuGi14] C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text,” Proc. 8th Int. Conf. Weblogs Soc. Media, ICWSM 2014, no. January, pp. 216–225, 2014. [Text19] TextBlobDevelopers, “Textblob Tutorial, Quickstart.” 2019. [Erem19] R. Eremyan, “Four Pitfalls of Sentiment Analysis Accuracy.” 2019. [Port20] J. Portela García-Miguel, “Temario de La Asignatura de Machine Learning.” 2020. [Eiki19] Eiki, “Feature Extraction in Natural Language Processing with Python.” 2019. [Rodr19] C. Rodríguez Abellán, “Word Embeddings: Cómo La IA Nos Muestra La Evolución de Las Palabras.” 2019. [PeSM14] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation,” EMNLP 2014 - 2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., no. June 2018, pp. 1532–1543, 2014. [MCCD13] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013. [JGBD16] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, “FastText.Zip: 92 Compressing Text Classification Models,” pp. 1–13, 2016. [Kim18] R. Kim, “Another Twitter Sentiment Analysis with Python — Part 4 (Count Vectorizer, Confusion Matrix).” 2018. [Taco20] tacosdedatos, “De Texto a Vectores.” 2020. [Deve13] S. A. S. Developers, “SAS® Text Miner 12.3 Reference Help.” 2013. [Alon20] J. M. Alonso Revenga, “Temario de La Asignatura de Complementos de Formación En Técnicas de Minería de Datos.” 2020. [Calv20] A. Calviño Martínez, “Temario de La Asignatura de Técnicas de Minería de Datos (SEMMA).” 2020. [IgEl03] I. Iguyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res., vol. 3, no. April, pp. 1157–1182, 2003. [Roma19] V. Roman, “Algoritmos Naive Bayes: Fundamentos e Implementación.” 2019. [Deve00] S.-L. Developers, “Multinomial Naive Bayes.” . [Amat20] J. Amat Rodrigo, “Árboles de Decisión, Random Forest, Gradient Boosting y C5.0.” 2020. [ChLi11] C. C. Chang and C. J. Lin, “LIBSVM: A Library for Support Vector Machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1–39, 2011. [FCHW08] R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, “LIBLINEAR: A Library for Large Linear Classification,” J. Mach. Learn. Res., vol. 9, no. 2008, pp. 1871–1874, 2008. [Song20] D. Song, “¿Cómo Hidratar Un Conjunto de Tweets?” Jun. 2020. [Summ20] E. Summers, “Hydrator [Computer Software].” 2020. [BrVe18] M. A. Breddels and J. Veljanoski, “Vaex: Big Data Exploration in the Era of Gaia,” Astron. Astrophys., vol. 618, no. McKinney 2010, Jan. 2018. [Akul21] R. Akula, “Interpretable Multi-Head Self-Attention Architecture For,” 2021. [KuBJ19] N. Kumaresh, V. Bonta, and N. Janardhan, “A Comprehensive Study on Lexicon Based Approaches for Sentiment Analysis,” Asian J. Comput. Sci. Technol., vol. 8, no. S2, pp. 1–6, 2019. [KoTA00] O. Kolchyna, P. C. Treleaven, and T. Aste, “Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination.” [Zarr14] A. A. A. A. I. Zarrad, “The Evaluation of Public Opinion,” Read. public Opin. Commun., pp. 664–670, 2014. [LLSS20] M. O. Lwin et al., “Global Sentiments Surrounding the COVID-19 Pandemic on Twitter: Analysis of Twitter Trends,” JMIR Public Heal. Surveill., vol. 6, no. 2, 2020. 93 [KaMa20] M. Y. Kabir and S. Madria, “CoronaVis: A Real-Time COVID-19 Tweets Data Analyzer and Data Repository,” 2020. [ONMS20] O. Oyebode et al., “COVID-19 Pandemic: Identifying Key Issues Using Social Media and Natural Language Processing,” arXiv, 2020. [APSN20] R. Awasthi et al., “CovidNLP: A Web Application for Distilling Systemic Implications of COVID-19 Pandemic with Natural Language Processing,” medRxiv, 2020. [RaRR20] V. Raghupathi, J. Ren, and W. Raghupathi, “Studying Public Perception about Vaccination: A Sentiment Analysis of Tweets,” Int. J. Environ. Res. Public Health, vol. 17, no. 10, 2020. [RKAR21] F. Rustam, M. Khalid, W. Aslam, V. Rupapara, A. Mehmood, and G. S. Choi, “A Performance Comparison of Supervised Machine Learning Models for Covid-19 Tweets Sentiment Analysis,” PLoS One, vol. 16, no. 2, pp. 1–23, 2021. [CBBP20] K. Chakraborty, S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag, and A. E. Hassanien, “Sentiment Analysis of COVID-19 Tweets by Deep Learning Classifiers—A Study to Show How Popularity Is Affecting Accuracy in Social Media,” Appl. Soft Comput. J., vol. 97, p. 106754, 2020. [MüSK20] M. Müller, M. Salathé, and P. E. Kummervold, “Covid-Twitter-Bert: A Natural Language Processing Model to Analyse Covid-19 Content on Twitter,” arXiv, 2020. [Deve20] S.-L. Developers, “Classification Metrics.” 2020. [Lams20] R. Lamsal, “Design and Analysis of a Large-Scale COVID-19 Tweets Dataset,” Appl. Intell., no. November 2020, pp. 2790–2804, 2020. [Roha20] P. Rohan, “Term Frequency and Inverse Document Frequency.” 2020. [Orei21] OREILLY, “Chapter 1. Gaining Early Insights from Textual Data".2020. [Sye21] S. A. R. Syed Alwi, E. Rafidah, A. Zurraini, O. Juslina, I. B. Brohi and S. Lukas, “A survey on COVID-19 vaccine acceptance and concern among Malaysians", BMC Public Health, 2021.