Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives

Impacto

Downloads

Downloads per month over past year



Núñez Antón, Vicente and Pérez Salamero González, Juan Manuel and Regúlez Castillo, Marta and Vidal Meliá, Carlos (2019) Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives. [ Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE); nº 20, ISSN: 2341-2356 ]

[thumbnail of 1920.pdf]
Preview
PDF
834kB

Official URL: https://eprints.ucm.es/55423/1/1920.pdf




Abstract

This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is therefore NP-hard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a p-value high enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The beauty of the model is that it gives the user the freedom to choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records. Several waves (2005-2017) are first examined without using the model and the conclusion is that they are not representative of the target population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.


Item Type:Working Paper or Technical Report
Uncontrolled Keywords:Optimization; Subsampling; Chi-square test; P-value, Continuous Sample of Working Lives.
Subjects:Sciences > Statistics > Mathematical optimization
Social sciences > Economics > Public economy
JEL:C61, C81, C12, H55, J26
Series Name:Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE)
Volume:
Number:20
ID Code:55423
Deposited On:24 May 2019 12:44
Last Modified:04 Mar 2021 11:48

Origin of downloads

Repository Staff Only: item control page