Gender Distribution across Topics in the Top 5 Economics Journals: A Machine Learning Approach



Downloads per month over past year

Conde Ruiz, José Ignacio and Ganuza, Juan-José and García, Manu and Puch, Luis A. (2021) Gender Distribution across Topics in the Top 5 Economics Journals: A Machine Learning Approach. [ Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE); nº 09, 2109, ISSN: 2341-2356 ]

[thumbnail of 2109.pdf]


We analyze all the articles published in the top five (T5) Economics journals be- tween 2002 and 2019 in order to find gender differences in their research approach. We implement an unsupervised machine learning algorithm: the Structural Topic Model (STM), so as to incorporate gender document-level meta-data into a probabilistic text model. This algorithm characterizes jointly the set of latent topics that best fits our data (the set of abstracts) and how the documents/abstracts are allocated to each latent topic. Latent topics are mixtures over words where each word has a probability of belonging to a topic after controlling by journal name and publication year (the meta-data). Thus, the topics may capture research fields but also other more subtle characteristics related to the way in which the articles are written. We find that fe- males are unevenly distributed along the estimated latent topics, by using only data driven methods. This finding relies on “automatically” generated built-in data given the contents in the abstracts of the articles in the T5 journals, without any arbitrary allocation of texts to particular categories (as JEL codes, or research areas).

Item Type:Working Paper or Technical Report
Additional Information:

We thank Antonio Cabrales, Pedro Delicado and Nagore Iriberri for helpful comments, and Elvira Alonso for excellent research assistance. We also thank the Editor and two anonymous referees for their suggestions, as well as session participants at Computing in Economics & Finance Conference, Tokyo (virtual) 2021.
José Ignacio Conde-Ruiz and, Manu García and Luis Puch, respectively, acknowledge the Spanish Ministry of Science and Innovation for financial support through projects PID2019-105499GB-I00 and PID2019-107161GB-C32.
Juan-José Ganuza gratefully acknowledges the financial support from the Spanish Agencia Estatal de Investigación, through the Severo Ochoa Programme for Centres of Excellence in R&D (CEX2019-000915-S) and the Spanish Ministry of Education and Science Through Project ECO2017-89240-P.
†Corresponding Author: Juan-Jose Ganuza, Universitat Pompeu Fabra, Ramon Trias Fargas 27, 08005, Spain; E-mail:

Uncontrolled Keywords:Machine Learning; Gender Gaps; Structural Topic Model; Gendered Language; Research Fields.
Subjects:Social sciences > Economics
Social sciences > Economics > Econometrics
JEL:I20, J16, Z13
Series Name:Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE)
ID Code:67146
Deposited On:22 Jul 2021 07:23
Last Modified:01 Dec 2022 08:59

Origin of downloads

Repository Staff Only: item control page