Publication:
An end-to-end statistical process with mobile network data for official statistics

Research Projects
Organizational Units
Journal Issue
Abstract
Mobile network data has been proven to provide a rich source of information in multiple statistical domains such as demography, tourism, urban planning, etc. However, the incorporation of this data source to the routinely production of official statistics is taking many efforts since a diversity of highly entangled issues (access, methodology, IT tools, quality, skills) must be solved beforehand. To do this, one-off studies with concrete data sets are not enough and a standard statistical production process must be put in place. We propose a concrete modular process structured into evolvable modules detaching the strongly technological layer underlying this data source from the necessary statistical analysis producing outputs of interest. This architecture follows the principles of the so-called ESS Reference Methodological Framework for Mobile Network Data. Each of these modules deals with a different aspect of this data source. We apply hidden Markov models for the geolocation of mobile devices, use a Bayesian approach on this model to disambiguate devices belonging to the same individual, compute aggregate numbers of individuals detected by a telecommunication network using probability theory, and model hierarchically the integration of auxiliary information from the telco market and official data to produce final estimates of the number of individuals across different territorial regions in the target population. A first simple illustrative proposal has been applied to synthetic data providing preliminary software tools and accuracy indicators monitoring the performance of the process. Currently, this exercise has been applied to the estimation of present population and origin-destination matrices. We present an illustrative example of the execution of these production modules comparing results with the simulated ground truth, thus assessing the performance of each production module.
Description
Salgado, D., Sanguiao, L., Oancea, B. et al. An end-to-end statistical process with mobile network data for official statistics. EPJ Data Sci. 10, 20 (2021). https://doi.org/10.1140/epjds/s13688-021-00275-w
Unesco subjects
Keywords
Citation
1. Miao G, Zander J, Sung W, Slimane SB (2016) Fundamentals of mobile data networks. Cambridge University Press, Cambridge 2. González MC, Hidalgo CA, Barabási A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782. https://doi.org/10.1038/nature06958 3. Ahas R, Silm S, Järv O, Saluveer E, Tiru M (2010) Using mobile positioning data to model locations meaningful to users of mobile phones. J Urban Technol 17(1):3–27. https://doi.org/10.1080/10630731003597306 4. Phithakkitnukoon S, Smoreda Z, Olivier P (2012) Socio-geography of human mobility: a study using longitudinal mobile phone data. PLoS ONE 7(6):39253. https://doi.org/10.1371/journal.pone.0039253 5. Calabrese F, Diao M, Lorenzo GD, Ferreira J, Ratti C (2013) Understanding individual mobility patterns from urban sensing data: a mobile phone trace example. Transp Res, Part C, Emerg Technol 26:301–313. https://doi.org/10.1016/j.trc.2012.09.009 6. Deville P, Linard C, Martin S, Gilbert M, Stevens FR, Gaughan AE, Blondel VD, Tatem AJ (2014) Dynamic population mapping using mobile phone data. Proc Natl Acad Sci USA 111(45):15888–15893. https://doi.org/10.1073/pnas.1408439111 7. Louail T, Lenormand M, Ros OGC, Picornell M, Herranz R, Frias-Martinez E, Ramasco JJ, Barthelemy M (2014) From mobile phone data to the spatial structure of cities. Sci Rep 4(1):5276. https://doi.org/10.1038/srep05276 8. Iqbal MS, Choudhury CF, Wang P, González MC (2014) Development of origin–destination matrices using mobile phone call data. Transp Res, Part C, Emerg Technol 40:63–74. https://doi.org/10.1016/j.trc.2014.01.002 9. Blondel VD, Decuyper A, Krings G (2015) A survey of results on mobile phone datasets analysis. EPJ Data Sci 4(1):10. https://doi.org/10.1140/epjds/s13688-015-0046-0 10. Douglass RW, Meyer DA, Ram M, Rideout D, Song D (2015) High resolution population estimates from telecommunications data. EPJ Data Sci 4(1):4. https://doi.org/10.1140/epjds/s13688-015-0040-6 11. Pappalardo L, Vanhoof M, Gabrielli L, Smoreda Z, Pedreschi D, Giannotti F (2016) An analytical framework to nowcast well-being using mobile phone data. Int J Data Sci Anal 2(1–2):75–92. https://doi.org/10.1007/s41060-016-0013-2 12. Raun J, Ahas R, Tiru M (2016) Measuring tourism destinations using mobile tracking data. Tour Manag 57:202–212. https://doi.org/10.1016/j.tourman.2016.06.006 13. Ricciato F, Widhalm P, Pantisano F, Craglia M (2017) Beyond the “single-operator, CDR-only” paradigm: an interoperable framework for mobile phone network data analyses and population density estimation. Pervasive Mob Comput 35:65–82. https://doi.org/10.1016/j.pmcj.2016.04.009 14. Graells-Garrido E, Caro D, Parra D (2018) Inferring modes of transportation using mobile phone data. EPJ Data Sci 7(1):49. https://doi.org/10.1140/epjds/s13688-018-0177-1 15. Wang Z, He SY, Leung Y (2018) Applying mobile phone data to travel behaviour research: a literature review. Travel Behav Soc 11:141–155. https://doi.org/10.1016/j.tbs.2017.02.005 16. Debusschere M, Sonck J, Skaliotis M (2016) Official statistics and mobile network operator partner up in Belgium. In: OECD statistics newsletter, pp 11–14 17. Williams S (2016) Statistical uses for mobile phone data: literature review. Technical report, Office for National Statistics 18. Nurmi O (2016) Improving the accuracy of outbound tourism statistics with mobile positioning data. In: 15th global forum on tourism statistics, Cusco, Peru 19. Izquierdo-Valverde M, Mascuñano JP, Velasco-Gimeno M (2016) Same-day visitors crossing borders a big and data approach using traffic control. In: 14th global forum on tourism statistics, Venice, Italy 20. Dattilo B, Radini R, Sabato M (2016) How many SIM in your luggage? A strategy to make mobile phone data usable in tourism statistics. In: 14th global forum on tourism statistics 21. Senaeve G, Demunter C (2016) When mobile network operators and statistical offices meet - integrating mobile positioning data into the production process of tourism statistics. In: 14th global forum on tourism statistics, Venice, Italy 22. Meersman FD, Seynaeve G, Debusschere M, Lusyne P, Dewitte P, Baeyens Y, Wirthmann A, Demunter C, Reis F, Reuter HI (2016) Assessing the quality and of mobile and phone data as a source of statistics. In: European conference on quality in official statistics (Q2016), Madrid 23. Reis F, Seynaeve G, Wirthmann A, de Meersman F, Debusschere M (2017) Land use classification based on present population daily profiles from a big data source. https://www.conference-service.com/NTTS2017/documents/agenda/data/abstracts/abstract_172.html 24. Sakarovitch B, de Bellefon M-P, Givord P, Vanhoof M (2019) Estimating the residential population from mobile phone data, an initial exploration. Écon. Stat. 505d:109–132. https://doi.org/10.24187/ecostat.2018.505d.1968 25. Galiana L, Sakarovitch B, Smoreda Z (2018) Understanding socio-spatial segregation in French cities with mobile phone data. DGINS18 26. Lestari TK, Esko S, Sarpono SE, Rufiadi R (2018) Indonesia’s experience of using signaling mobile positioning data for official tourism statistics. In: 15th world forum on tourism statistics, Cusco, Peru. http://www.15th-tourism-stats-forum.com/papers.html 27. UN (2017) Handbook on the use of mobile phone data for official and statistics 28. Ricciato F (2018) Towards a reference methodological framework for processing MNO data for official statistics. In: 15th world forum on tourism statistics 29. Ricciato F, Lanzieri G, Wirthmann A, Seynaeve G (2020) Towards a methodological framework for estimating present population density from mobile network operator data. Pervasive Mob Comput 68:101263. https://doi.org/10.1016/j.pmcj.2020.101263 30. Tennekes M, Gootzen YAPM, Shah SH (2020) A Bayesian approach to location estimation of mobile devices from mobile network operator data. Resreport, Statistics Netherlands (CBS). https://www.cbs.nl/-/media/_pdf/2020/22/cbds_working_paper_location_estimation.pdf 31. UNECE (2011) Strategic vision of the high-level group for strategic developments in business architecture in statistics. UNECE (Ed.), 59th Plennay session of conference of European statisticians, item 4. High-level group for the modernisation of official statistics. https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2011/1.e.pdf 32. Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES (2012) Statistical disclosure control. Wiley, Chichester. https://doi.org/10.1002/9781118348239 33. Templ M (2017) Statistical disclosure control for microdata. Springer, Berlin. https://doi.org/10.1007/978-3-319-50272-4 34. de Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD (2013) Unique in the crowd: the privacy bounds of human mobility. Sci Rep 3(1):1376. https://doi.org/10.1038/srep01376 35. Xu F, Tu Z, Li Y, Zhang P, Fu X, Jin D (2017) Trajectory recovery from ash. In: Proceedings of the 26th international conference on world wide web. International World Wide Web Conferences Steering Committee. https://doi.org/10.1145/3038912.3052620 36. UNECE (1992) Fundamental principles of official statistics. Technical report, United Nations. https://www.unece.org/stats/fps.html 37. European Commission (2020) Shaping Europe’s digital future. https://ec.europa.eu/digital-single-market/en 38. Salgado D, Oancea B (2020) On new data sources for the production of official statistics. Statistics Spain (INE) working paper. 01/2020. https://www.ine.es/ss/Satellite?blobcol=urldata&blobheader=application/pdf&blobheadername1= Content-Disposition&blobheadervalue1=attachment;filename=art_doctr012020.pdf&blobkey=urldata&blobtable=MungoBlobs&blobwhere=603/210/art_doctr012020.pdf&ssbinary=true 39. Ucar I, Gramaglia M, Fiore M, Smoreda Z, Moro E (2019) Netflix or youtube? Regional income patterns of mobile service consumption. In: NetMob 2019, Oxford, UK 40. Barabási A-L (2016) Network science. Cambridge University Press, Cambridge. http://networksciencebook.com/ 41. Salgado D, Esteban ME, Novás M, Saldaña S, Sanguiao L (2018) Data organisation and process design based on functional modularity for a standard production process. J Off Stat 34(4):811–833. https://doi.org/10.2478/jos-2018-0041 42. Oancea B, Necula M, Sanguiao L, Salgado D, Barragán S (2019) A simulator for network event data. Technical report, Statistics Romania (INS) and Statistics Spain (INE). https://ec.europa.eu/eurostat/cros/sites/crosportal/files/ WPI_Deliverable_I2_Data_Simulator_-_A_simulator_for_network_event_data.pdf_en 43. Bordin MV (2017) A call detail record (CDR) generator. https://github.com/mayconbordin/cdr-gen 44. Real Impact Analysis (2014). CDR generator. https://github.com/RealImpactAnalytics/cdr-generator 45. Tetcos (2019) NetSim user manual. https://www.tetcos.com/downloads/v12/NetSim_User_Manual.pdf 46. Zhen L, Hongji Y (2012) Unlocking the power of OPNET modeler. Cambridge University Press, New York 47. Krajzewicz D, Erdmann J, Behrisch M, Bieker L (2012) Recent development and applications of sumo - simulation of urban mobility. J Adv Syst Meas 5(3&4):128–138 48. Horni A, Nagel K, Axhausen KW (2016) The multi-agent transport simulation MATSim. Ubiquity Press, London 49. Shabbir N, Sadiq MT, Kashif H, Ullah R (2011) Comparison of radio propagation models for long term evolution (LTE) network. Int J Next-Gener Netw 3(3):27–41. https://doi.org/10.5121/ijngn.2011.3303 50. Salgado D, Sanguiao L, Oancea B, Barragán S, Necula M (2020) Collection of data sets and scripts for “An end-to-end statistical process with mobile network data for official statistics”. https://figshare.com/articles/dataset/_/12861095 51. Caffery JJ, Stuber GL (1998) Overview of radiolocation in CDMA cellular systems. IEEE Commun Mag 36(4):38–45. https://doi.org/10.1109/35.667411 52. Dye M, Baylin F (2001) Mobile positioning. Mobile Lifestreams Ltd, London 53. Gustafsson F, Gunnarsson F (2005) Mobile positioning using wireless networks: possibilities and fundamental limitations based on available wireless network measurements. IEEE Signal Process Mag 22(4):41–53. https://doi.org/10.1109/msp.2005.1458284 54. Gezici S (2007) A survey on wireless position estimation. Wirel Pers Commun 44(3):263–282. https://doi.org/10.1007/s11277-007-9375-z 55. Mohammadi M, Molaei E, Naserasadi A (2011) A survey on location based services and positioning techniques. Int J Comput Appl 24(5):1–5. https://doi.org/10.5120/2946-3928 56. Liu D, Sheng B, Hou F, Rao W, Liu H (2014) From wireless positioning to mobile positioning: an overview of recent advances. IEEE Syst J 8(4):1249–1259. https://doi.org/10.1109/jsyst.2013.2295136 57. Mahyuddin MFM, Isa A, Zin M, Afifah Maheran AH, Manap Z, Ismail MK (2017) Overview of positioning techniques for LTE technology. J Telecommun Electron Comput Eng 9(2–13):43–50 58. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. https://doi.org/10.1109/5.18626 59. Bishop CM (2006) Pattern recognition and machine learning. Springer, Cambridge 60. Vanhoof M, Reis F, Ploetz T, Smoreda Z (2018) Assessing the quality of home detection from mobile phone data for official statistics. J Off Stat 34(4):935–960. https://doi.org/10.2478/jos-2018-0046 61. Lehmann EL, Casella G (2003) Theory of point estimation. Springer, New York 62. Daskalakis C, Kamath G, Tzamos C (2015) On the structure, covering, and learning of Poisson multinomial distributions. https://doi.org/10.1109/FOCS.2015.77 63. Royle AJ, Dorazio RM (2009) Hierarchical modelling and inference in ecology. Elsevier, New York 64. Bryant JR, Graham PJ (2013) Bayesian demographic accounts: subnational population estimation using multiple data sources. Bayesian Anal 8(3):591–622. https://doi.org/10.1214/13-ba820 65. Bryant JR, Graham P (2015) A Bayesian approach to population estimation with administrative data. J Off Stat 31(3):475–487. https://doi.org/10.1515/jos-2015-0028 66. Eurostat (2014) Feasibility study on the use of mobile positioning data for tourism statistics?: consolidated report. Publications Office. https://doi.org/10.2785/55051 67. ESS (2016) ESSnet on Big Data. https://ec.europa.eu/eurostat/cros/content/essnet-big-data-0_en 68. UN GWG (2017) Handbook on the use of mobile phone data for official statistics. Technical report, United Nations. https://unstats.un.org/bigdata/blog/2019/mpd-task-team.cshtml 69. Saluveer E, Raun J, Tiru M, Altin L, Kroon J, Snitsarenko T, Aasa A, Silm S (2020) Methodological framework for producing national tourism statistics from mobile positioning data. Ann Tour Res 81:102895. https://doi.org/10.1016/j.annals.2020.102895 70. WPI (2018) Work package I on mobile network data. Project ESSnet on Big Data II. https://ec.europa.eu/eurostat/cros/content/WPI_Mobile_networks_data_en 71. Banerjee S, Carlin BC, Gelfand AE (2015) Hierarchical modelling and analysis for spatial data, 2nd edn. CRC Press, Boca Raton 72. Sanguiao L, Barragán S, Salgado D (2020) destim: an R package for mobile devices position estimation. R package version 0.1.0. https://github.com/Luis-Sanguiao/destim 73. Oancea B, Barragán S, Salgado D (2020) deduplication: an R package for deduplicating mobile device counts into population individual counts. R package version 0.1.0. https://github.com/bogdanoancea/deduplication 74. Oancea B, Barragán S, Salgado D (2020) aggregation: an R package to produce probability distributions of aggregate number of mobile devices. R package version 0.1.0. https://github.org/bogdanoancea/aggregation 75. Oancea B, Barragán S, Salgado D (2020) inference: R package for computing the probability distribution of the number of individuals in the target population. R package version 0.1.0. https://github.com/bogdanoancea/inference
Collections