Publication:
A Collection of Samples for Research in Google: Design and Application of a Sample Selection Method: Results and Problems of Research

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2012
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
This article examines the use and validity of Google’s search engine for the collection of corpuses of materials for research. To this end, it develops two interrelated themes. In the first section, a methodology is developed which is designed to identify universes in Google that meet the criteria and parameters required by an academic study. This methodology makes use of the search engine’s own logic and is applicable to most on-line document searches. The second section discusses the limitations and skewing of results arising from Google’s mode of operation which have an impact on the scientific validity of the universes it generates. This part focuses on the completeness and representativeness of the Google universes with regards to the full range of contents available on the Internet.
Description
Documento vinculado con el proyecto de I+D+i “La producción social de la comunicación y la reproducción social en la era de la globalización” (ref. CSO2010-22104-C03-01). Este proyecto ha sido financiado por el Ministerio de Ciencia e Innovación (convocatoria competitiva del Plan Nacional de I+D+i 2008-2011 - Programa de Proyectos de Investigación Fundamental) y realizado por el grupo de investigación de la Universidad Complutense de Madrid “Identidades sociales y comunicación” desde el año 2011 al 2014. En E-Prints se han depositado los trabajos relacionados con el proyecto (véase “Trabajos relacionados con el proyecto de I+D+i La producción social de la comunicación y la reproducción social en la era de la globalización” (http://eprints.ucm.es/24131/).
Keywords
Citation
Altman, A., & Tennenholtz, M. (2005). Ranking systems: the PageRank axioms. Proceedings of the 6th ACM conference on Electronic commerce, Vancouver, BC, Canada, 1-8. Retrieved from http://stanford.edu/~epsalon/pagerank.pdf Aubuchon, V. (2009). Google ranking factors - SEO checklist. Retrieved from http://www.vaughns-1-pagers.com/internet/google-ranking-factors.htm Ayuda para Webmasters de Google. (s.d).Cómo calcula Google el número de resultados? Retrieved from http://www.google.com/support/webmasters/bin/answer.py?hl=es&answer=70920 Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509-12. Retrieved from http://www.barabasilab.com/pubs/CCNR-ALB_Publications/199910-15_Science-Emergence/199910-15_Science-Emergence.pdf Barfourosh, A., Anderson, M. L., Nezhad, H. R. M., & Perlis, D. (2002). Information retrieval on the World Wide Web and active logic: a survey and problem definition. Technical Report, CS-TR-4291. College Park, MD: University of Maryland, Computer Science Department. Retrieved from http://www.lib.umd.edu/drum/bitstream/1903/1153/1/CS-TR-4291.pdf Benkler, Y. (2006). The wealth of networks. How social production transforms markets and freedom. London and New Haven: Yale University Press. Bergman, M. K. (2001, August). White paper: The deep web: Surfacing hidden value. Journal of Electronic Publishing, 7(1). Retrieved from http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104 Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Proceedings of the seventh international conference on World Wide Web, Brisbane, Australia, 7, 107-117. Retrieved from http://infolab.stanford.edu/~backrub/google.html Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab, 29 January. Retrieved from http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Edelman, B., & Zittrain, J. (2002, October 26). Localized google search result exclusions. Statement of issues and call for data. Berkman Center for Internet & Society. Retrieved from http://cyber.law.harvard.edu/filtering/google/ Elmer, G. (2006). Re-tooling the network. Parsing the links and codes of the web world. Convergence, 12(1), 9-19. Fleischner, M. (2009). SEO made simple: Strategies for dominating the world's largest search engine. USA: Lightning Press. Gerhards, J., & Schäfer, M. S. (2010). Is the internet a better public sphere? Comparing old and new media in the USA and Germany. New Media & Society, 12(1), 143-60. Google (2010). Search engine optimization starter guide. Retrieved from http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf Google Corporate Info (n.d.). Technology overview. Retrieved from http://www.google.com/intl/en/corporate/tech.html Google Centro para Webmasters (n.d.). Directrices para webmasters. Retrieved from http://www.google.com/support/webmasters/bin/answer.py?answer=35769 Google Información Corporativa (n.d.). Visión general de la tecnología. Retrieved from http://www.google.es/intl/es/corporate/tech.html Gulli, A., & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. International World Wide Web Conference, Special interest tracks and posters of the 14th international conference on World Wide Web, Chiba, Japan, New York, NY: ACM. 902–903. Retrieved from http://www.di.unipi.it/~gulli/papers/f692_gulli_signorini.pdf He, B., Patel, M., Zhang, Z., & Chang, K. C. C. (2007). Accessing the deep web: A survey. Communications of the ACM, 50(5), 95–101. Retrieved from http://brightplanet.com/images/uploads/Accessing%20the%20Deep%20Web%20-%20A%20Survey.pdf Hindman, M., Tsioutsiouliklis, K., & Johnson, J. J. (2003). Googlearchy: How a few heavily-linked sites dominate politics on the web. Annual Meeting of the Midwest Political Science Association, Chicago, IL. Retrieved from http://www.cs.princeton.edu/~kt/mpsa03.pdf Huberman, B., & Adamic, L. (1999). Growth dynamics of the World Wide Web. Nature, no. 401, p. 131. Introna, L., & Nissenbaum, H. (2000). The public good vision of the Internet and the politics of search engines. In R. Rogers (Ed.), Preferred placement: Knowledge politics on the Web (pp. 25–47). Maastricht: Jan van Eyck Akademy. Lawrence, S., & Giles, C.L. (1999). Accessibility of information on the web. Nature, 400, 107-109. Retrieved from http://www.cse.ust.hk/zsearch/qualify/DistributedSearch/acessibility%20of%20information%20on%20the%20web.pdf Madhavan, J., Ko, D., Kot, L., Ganapathy, A., Rasmussen, A., & Halevy, A. (2008, August). Google’s deep web crawl. PVLDB, 23-28. Retrieved from http://cseweb.ucsd.edu/groups/sysnet/miscpapers/p1241-madhavan.pdf Mager, A. (2009). Health information mediated health: sociotechnical practices of providing and using online health information. New Media & Society, 11(7), 1123-42. Martín Serrano, M. (1974). Nuevos métodos para la investigación de la estructura y la dinámica de la enculturización. REIS, 37, 23-83. Martín Serrano, M. (1977; 2008). La mediación social. Madrid: Akal. Martín Serrano, M. (1978a). Métodos actuales de la investigación social. Madrid: Akal. Martín Serrano, M. (1978b). Un método lógico para analizar los significados. REIS, 2, 21-51. Martín Serrano, M. (1986; 2004). La producción social de comunicación. Madrid: Alianza. Mediaciones sociales. (2007). Número monográfico, segundo semestre. Retrieved from http://www.ucm.es/info/mediars/MediacioneS1/Indice/indice.html Pennock, D.M., Flake, G. W., Lawrence, S., Glover, E. J., & Giles, C. L. (2002). Winners don’t take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99(8), 5207-5211. Retrieved from http://www.modelingtheweb.com/ Schwartz, B. (2008). The Google quality raters handbook. Retrieved from http://searchengineland.com/the-google-quality-raters-handbook-13575 Seale, C. (2005). New directions for critical Internet health studies: Representing cancer experience on the Web. Sociology of Health & Illness, 27(4), 515–40. Segal, D. (2010). A bully finds a pulpit on the web. Retrieved from http://www.nytimes.com/2010/11/28/business/28borker.html Sherman, C., & Price, G. (2001). The invisible web. Uncovering information sources search engines can’t see. Medford, NJ: Information Today Inc. Smarty, A. (2009). Let’s try to find all 200 parameters in Google algorithm. Retrieved from http://www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/ Sweeny, E., Curran, K., & Xie, E. (2010). Automating information discovery within the invisible web. In J.T. Yao (Ed.), Web-based support systems (pp. 167-81). London: Springer-Verlag. Wouters, P. (2006). On the visibility of information on the Web: an exploratory experimental approach. Research Evaluation, 15(2), 107-15.
Collections