Publication: An end-to-end statistical process with mobile network data for official statistics
Loading...
Official URL
Full text at PDC
Publication Date
2021
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer
Abstract
Mobile network data has been proven to provide a rich source of information in multiple statistical domains such as demography, tourism, urban planning, etc. However, the incorporation of this data source to the routinely production of official statistics is taking many efforts since a diversity of highly entangled issues (access, methodology, IT tools, quality, skills) must be solved beforehand. To do this, one-off studies with concrete data sets are not enough and a standard statistical production process must be put in place. We propose a concrete modular process structured into evolvable modules detaching the strongly technological layer underlying this data source from the necessary statistical analysis producing outputs of interest. This architecture follows the principles of the so-called ESS Reference Methodological Framework for Mobile Network Data. Each of these modules deals with a different aspect of this data source. We apply hidden Markov models for the geolocation of mobile devices, use a Bayesian approach on this model to disambiguate devices belonging to the same individual, compute aggregate numbers of individuals detected by a telecommunication network using probability theory, and model hierarchically the integration of auxiliary information from the telco market and official data to produce final estimates of the number of individuals across different territorial regions in the target population. A first simple illustrative proposal has been applied to synthetic data providing preliminary software tools and accuracy indicators monitoring the performance of the process. Currently, this exercise has been applied to the estimation of present population and origin-destination matrices. We present an illustrative example of the execution of these production modules comparing results with the simulated ground truth, thus assessing the performance of each production module.
Description
Salgado, D., Sanguiao, L., Oancea, B. et al. An end-to-end statistical process with mobile network data for official statistics. EPJ Data Sci. 10, 20 (2021). https://doi.org/10.1140/epjds/s13688-021-00275-w
UCM subjects
Unesco subjects
Keywords
Citation
1. Miao G, Zander J, Sung W, Slimane SB (2016) Fundamentals of mobile data networks. Cambridge University Press, Cambridge
2. González MC, Hidalgo CA, Barabási A-L (2008) Understanding individual human mobility patterns. Nature
453(7196):779–782. https://doi.org/10.1038/nature06958
3. Ahas R, Silm S, Järv O, Saluveer E, Tiru M (2010) Using mobile positioning data to model locations meaningful to users of mobile phones. J Urban Technol 17(1):3–27. https://doi.org/10.1080/10630731003597306
4. Phithakkitnukoon S, Smoreda Z, Olivier P (2012) Socio-geography of human mobility: a study using longitudinal
mobile phone data. PLoS ONE 7(6):39253. https://doi.org/10.1371/journal.pone.0039253
5. Calabrese F, Diao M, Lorenzo GD, Ferreira J, Ratti C (2013) Understanding individual mobility patterns from urban
sensing data: a mobile phone trace example. Transp Res, Part C, Emerg Technol 26:301–313.
https://doi.org/10.1016/j.trc.2012.09.009
6. Deville P, Linard C, Martin S, Gilbert M, Stevens FR, Gaughan AE, Blondel VD, Tatem AJ (2014) Dynamic population
mapping using mobile phone data. Proc Natl Acad Sci USA 111(45):15888–15893.
https://doi.org/10.1073/pnas.1408439111
7. Louail T, Lenormand M, Ros OGC, Picornell M, Herranz R, Frias-Martinez E, Ramasco JJ, Barthelemy M (2014) From
mobile phone data to the spatial structure of cities. Sci Rep 4(1):5276. https://doi.org/10.1038/srep05276
8. Iqbal MS, Choudhury CF, Wang P, González MC (2014) Development of origin–destination matrices using mobile
phone call data. Transp Res, Part C, Emerg Technol 40:63–74. https://doi.org/10.1016/j.trc.2014.01.002
9. Blondel VD, Decuyper A, Krings G (2015) A survey of results on mobile phone datasets analysis. EPJ Data Sci 4(1):10. https://doi.org/10.1140/epjds/s13688-015-0046-0
10. Douglass RW, Meyer DA, Ram M, Rideout D, Song D (2015) High resolution population estimates from telecommunications data. EPJ Data Sci 4(1):4. https://doi.org/10.1140/epjds/s13688-015-0040-6
11. Pappalardo L, Vanhoof M, Gabrielli L, Smoreda Z, Pedreschi D, Giannotti F (2016) An analytical framework to nowcast well-being using mobile phone data. Int J Data Sci Anal 2(1–2):75–92. https://doi.org/10.1007/s41060-016-0013-2
12. Raun J, Ahas R, Tiru M (2016) Measuring tourism destinations using mobile tracking data. Tour Manag 57:202–212. https://doi.org/10.1016/j.tourman.2016.06.006
13. Ricciato F, Widhalm P, Pantisano F, Craglia M (2017) Beyond the “single-operator, CDR-only” paradigm: an
interoperable framework for mobile phone network data analyses and population density estimation. Pervasive Mob
Comput 35:65–82. https://doi.org/10.1016/j.pmcj.2016.04.009
14. Graells-Garrido E, Caro D, Parra D (2018) Inferring modes of transportation using mobile phone data. EPJ Data Sci 7(1):49. https://doi.org/10.1140/epjds/s13688-018-0177-1
15. Wang Z, He SY, Leung Y (2018) Applying mobile phone data to travel behaviour research: a literature review. Travel
Behav Soc 11:141–155. https://doi.org/10.1016/j.tbs.2017.02.005
16. Debusschere M, Sonck J, Skaliotis M (2016) Official statistics and mobile network operator partner up in Belgium. In: OECD statistics newsletter, pp 11–14
17. Williams S (2016) Statistical uses for mobile phone data: literature review. Technical report, Office for National Statistics
18. Nurmi O (2016) Improving the accuracy of outbound tourism statistics with mobile positioning data. In: 15th global forum on tourism statistics, Cusco, Peru
19. Izquierdo-Valverde M, Mascuñano JP, Velasco-Gimeno M (2016) Same-day visitors crossing borders a big and data
approach using traffic control. In: 14th global forum on tourism statistics, Venice, Italy
20. Dattilo B, Radini R, Sabato M (2016) How many SIM in your luggage? A strategy to make mobile phone data usable in
tourism statistics. In: 14th global forum on tourism statistics
21. Senaeve G, Demunter C (2016) When mobile network operators and statistical offices meet - integrating mobile
positioning data into the production process of tourism statistics. In: 14th global forum on tourism statistics, Venice, Italy
22. Meersman FD, Seynaeve G, Debusschere M, Lusyne P, Dewitte P, Baeyens Y, Wirthmann A, Demunter C, Reis F, Reuter HI (2016) Assessing the quality and of mobile and phone data as a source of statistics. In: European conference on quality in official statistics (Q2016), Madrid
23. Reis F, Seynaeve G, Wirthmann A, de Meersman F, Debusschere M (2017) Land use classification based on present population daily profiles from a big data source.
https://www.conference-service.com/NTTS2017/documents/agenda/data/abstracts/abstract_172.html
24. Sakarovitch B, de Bellefon M-P, Givord P, Vanhoof M (2019) Estimating the residential population from mobile phone data, an initial exploration. Écon. Stat. 505d:109–132. https://doi.org/10.24187/ecostat.2018.505d.1968
25. Galiana L, Sakarovitch B, Smoreda Z (2018) Understanding socio-spatial segregation in French cities with mobile phone data. DGINS18
26. Lestari TK, Esko S, Sarpono SE, Rufiadi R (2018) Indonesia’s experience of using signaling mobile positioning data for official tourism statistics. In: 15th world forum on tourism statistics, Cusco, Peru.
http://www.15th-tourism-stats-forum.com/papers.html
27. UN (2017) Handbook on the use of mobile phone data for official and statistics
28. Ricciato F (2018) Towards a reference methodological framework for processing MNO data for official statistics. In: 15th world forum on tourism statistics
29. Ricciato F, Lanzieri G, Wirthmann A, Seynaeve G (2020) Towards a methodological framework for estimating present
population density from mobile network operator data. Pervasive Mob Comput 68:101263.
https://doi.org/10.1016/j.pmcj.2020.101263
30. Tennekes M, Gootzen YAPM, Shah SH (2020) A Bayesian approach to location estimation of mobile devices from
mobile network operator data. Resreport, Statistics Netherlands (CBS).
https://www.cbs.nl/-/media/_pdf/2020/22/cbds_working_paper_location_estimation.pdf
31. UNECE (2011) Strategic vision of the high-level group for strategic developments in business architecture in statistics. UNECE (Ed.), 59th Plennay session of conference of European statisticians, item 4. High-level group for the
modernisation of official statistics. https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2011/1.e.pdf
32. Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES (2012) Statistical disclosure control. Wiley,
Chichester. https://doi.org/10.1002/9781118348239
33. Templ M (2017) Statistical disclosure control for microdata. Springer, Berlin. https://doi.org/10.1007/978-3-319-50272-4
34. de Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD (2013) Unique in the crowd: the privacy bounds of human
mobility. Sci Rep 3(1):1376. https://doi.org/10.1038/srep01376
35. Xu F, Tu Z, Li Y, Zhang P, Fu X, Jin D (2017) Trajectory recovery from ash. In: Proceedings of the 26th international
conference on world wide web. International World Wide Web Conferences Steering Committee. https://doi.org/10.1145/3038912.3052620
36. UNECE (1992) Fundamental principles of official statistics. Technical report, United Nations. https://www.unece.org/stats/fps.html
37. European Commission (2020) Shaping Europe’s digital future. https://ec.europa.eu/digital-single-market/en
38. Salgado D, Oancea B (2020) On new data sources for the production of official statistics. Statistics Spain (INE) working paper. 01/2020. https://www.ine.es/ss/Satellite?blobcol=urldata&blobheader=application/pdf&blobheadername1=
Content-Disposition&blobheadervalue1=attachment;filename=art_doctr012020.pdf&blobkey=urldata&blobtable=MungoBlobs&blobwhere=603/210/art_doctr012020.pdf&ssbinary=true
39. Ucar I, Gramaglia M, Fiore M, Smoreda Z, Moro E (2019) Netflix or youtube? Regional income patterns of mobile
service consumption. In: NetMob 2019, Oxford, UK
40. Barabási A-L (2016) Network science. Cambridge University Press, Cambridge. http://networksciencebook.com/
41. Salgado D, Esteban ME, Novás M, Saldaña S, Sanguiao L (2018) Data organisation and process design based on
functional modularity for a standard production process. J Off Stat 34(4):811–833.
https://doi.org/10.2478/jos-2018-0041
42. Oancea B, Necula M, Sanguiao L, Salgado D, Barragán S (2019) A simulator for network event data. Technical report,
Statistics Romania (INS) and Statistics Spain (INE). https://ec.europa.eu/eurostat/cros/sites/crosportal/files/
WPI_Deliverable_I2_Data_Simulator_-_A_simulator_for_network_event_data.pdf_en
43. Bordin MV (2017) A call detail record (CDR) generator. https://github.com/mayconbordin/cdr-gen
44. Real Impact Analysis (2014). CDR generator. https://github.com/RealImpactAnalytics/cdr-generator
45. Tetcos (2019) NetSim user manual. https://www.tetcos.com/downloads/v12/NetSim_User_Manual.pdf
46. Zhen L, Hongji Y (2012) Unlocking the power of OPNET modeler. Cambridge University Press, New York
47. Krajzewicz D, Erdmann J, Behrisch M, Bieker L (2012) Recent development and applications of sumo - simulation of
urban mobility. J Adv Syst Meas 5(3&4):128–138
48. Horni A, Nagel K, Axhausen KW (2016) The multi-agent transport simulation MATSim. Ubiquity Press, London
49. Shabbir N, Sadiq MT, Kashif H, Ullah R (2011) Comparison of radio propagation models for long term evolution (LTE)
network. Int J Next-Gener Netw 3(3):27–41. https://doi.org/10.5121/ijngn.2011.3303
50. Salgado D, Sanguiao L, Oancea B, Barragán S, Necula M (2020) Collection of data sets and scripts for “An end-to-end statistical process with mobile network data for official statistics”. https://figshare.com/articles/dataset/_/12861095
51. Caffery JJ, Stuber GL (1998) Overview of radiolocation in CDMA cellular systems. IEEE Commun Mag 36(4):38–45.
https://doi.org/10.1109/35.667411
52. Dye M, Baylin F (2001) Mobile positioning. Mobile Lifestreams Ltd, London
53. Gustafsson F, Gunnarsson F (2005) Mobile positioning using wireless networks: possibilities and fundamental
limitations based on available wireless network measurements. IEEE Signal Process Mag 22(4):41–53.
https://doi.org/10.1109/msp.2005.1458284
54. Gezici S (2007) A survey on wireless position estimation. Wirel Pers Commun 44(3):263–282.
https://doi.org/10.1007/s11277-007-9375-z
55. Mohammadi M, Molaei E, Naserasadi A (2011) A survey on location based services and positioning techniques. Int J
Comput Appl 24(5):1–5. https://doi.org/10.5120/2946-3928
56. Liu D, Sheng B, Hou F, Rao W, Liu H (2014) From wireless positioning to mobile positioning: an overview of recent
advances. IEEE Syst J 8(4):1249–1259. https://doi.org/10.1109/jsyst.2013.2295136
57. Mahyuddin MFM, Isa A, Zin M, Afifah Maheran AH, Manap Z, Ismail MK (2017) Overview of positioning techniques for LTE technology. J Telecommun Electron Comput Eng 9(2–13):43–50
58. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. https://doi.org/10.1109/5.18626
59. Bishop CM (2006) Pattern recognition and machine learning. Springer, Cambridge
60. Vanhoof M, Reis F, Ploetz T, Smoreda Z (2018) Assessing the quality of home detection from mobile phone data for
official statistics. J Off Stat 34(4):935–960. https://doi.org/10.2478/jos-2018-0046
61. Lehmann EL, Casella G (2003) Theory of point estimation. Springer, New York
62. Daskalakis C, Kamath G, Tzamos C (2015) On the structure, covering, and learning of Poisson multinomial
distributions. https://doi.org/10.1109/FOCS.2015.77
63. Royle AJ, Dorazio RM (2009) Hierarchical modelling and inference in ecology. Elsevier, New York
64. Bryant JR, Graham PJ (2013) Bayesian demographic accounts: subnational population estimation using multiple data sources. Bayesian Anal 8(3):591–622. https://doi.org/10.1214/13-ba820
65. Bryant JR, Graham P (2015) A Bayesian approach to population estimation with administrative data. J Off Stat
31(3):475–487. https://doi.org/10.1515/jos-2015-0028
66. Eurostat (2014) Feasibility study on the use of mobile positioning data for tourism statistics?: consolidated report. Publications Office. https://doi.org/10.2785/55051
67. ESS (2016) ESSnet on Big Data. https://ec.europa.eu/eurostat/cros/content/essnet-big-data-0_en
68. UN GWG (2017) Handbook on the use of mobile phone data for official statistics. Technical report, United Nations.
https://unstats.un.org/bigdata/blog/2019/mpd-task-team.cshtml
69. Saluveer E, Raun J, Tiru M, Altin L, Kroon J, Snitsarenko T, Aasa A, Silm S (2020) Methodological framework for producing national tourism statistics from mobile positioning data. Ann Tour Res 81:102895.
https://doi.org/10.1016/j.annals.2020.102895
70. WPI (2018) Work package I on mobile network data. Project ESSnet on Big Data II. https://ec.europa.eu/eurostat/cros/content/WPI_Mobile_networks_data_en
71. Banerjee S, Carlin BC, Gelfand AE (2015) Hierarchical modelling and analysis for spatial data, 2nd edn. CRC Press, Boca Raton
72. Sanguiao L, Barragán S, Salgado D (2020) destim: an R package for mobile devices position estimation. R package
version 0.1.0. https://github.com/Luis-Sanguiao/destim
73. Oancea B, Barragán S, Salgado D (2020) deduplication: an R package for deduplicating mobile device counts into
population individual counts. R package version 0.1.0. https://github.com/bogdanoancea/deduplication
74. Oancea B, Barragán S, Salgado D (2020) aggregation: an R package to produce probability distributions of aggregate
number of mobile devices. R package version 0.1.0. https://github.org/bogdanoancea/aggregation
75. Oancea B, Barragán S, Salgado D (2020) inference: R package for computing the probability distribution of the
number of individuals in the target population. R package version 0.1.0. https://github.com/bogdanoancea/inference