NúñezAntón, Vicente and PérezSalamero González, José Manuel and RegúlezCastillo, Marta and Vidal Meliá, Carlos (2019) Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives. [ Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE); nº 20, ISSN: 23412356 ]

Official URL: https://eprints.ucm.es/55423/1/1920.pdf
https://www.ucm.es/icae  Organisation 
Abstract
This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixedinteger nonlinear programming (convex MINLP) and is therefore NPhard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a pvalue high enough to achieve a good fit to the population of interest using Pearson’s chisquare goodnessoffit test. The beauty of the model is that it gives the user the freedom to choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records. Several waves (20052017) are first examined without using the model and the conclusion is that they are not representative of the target population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.
Item Type:  Working Paper or Technical Report 

Uncontrolled Keywords:  Optimization; Subsampling; Chisquare test; Pvalue, Continuous Sample of Working Lives. 
Subjects:  Sciences > Statistics > Mathematical optimization Social sciences > Economics > Public economy 
JEL:  C61, C81, C12, H55, J26 
Series Name:  Documentos de Trabajo del Instituto Complutense de Análisis Económico (ICAE) 
Volume:  
Number:  20 
ID Code:  55423 
Deposited On:  24 May 2019 12:44 
Last Modified:  27 May 2019 08:02 
