Universidad Complutense de Madrid
E-Prints Complutense

Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives

Impacto

Downloads

Downloads per month over past year



Núñez-Antón, Vicente and Pérez-Salamero González, José Manuel and Regúlez-Castillo, Marta and Vidal Meliá, Carlos (2019) Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives. [ Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE); nº 20, ISSN: 2341-2356 ]

[img]
Preview
PDF
834kB

Official URL: https://eprints.ucm.es/55423/1/1920.pdf


URLURL Type
https://www.ucm.es/icaeOrganisation


Abstract

This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is therefore NP-hard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a p-value high enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The beauty of the model is that it gives the user the freedom to choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records. Several waves (2005-2017) are first examined without using the model and the conclusion is that they are not representative of the target population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.


Item Type:Working Paper or Technical Report
Uncontrolled Keywords:Optimization; Subsampling; Chi-square test; P-value, Continuous Sample of Working Lives.
Subjects:Sciences > Statistics > Mathematical optimization
Social sciences > Economics > Public economy
JEL:C61, C81, C12, H55, J26
Series Name:Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE)
Volume:
Number:20
ID Code:55423
Deposited On:24 May 2019 12:44
Last Modified:27 May 2019 08:02

Origin of downloads

Repository Staff Only: item control page