Data and big data: digital traces of social phenomena to nourish research
I was interviewd by SFR PLAYER (an online magazine published by SFR, a major Telecom provider in France) on the changes induced by the use of big data in my work as a social science researcher. The video interview (in French) is available here. The same issue features an interview with danah boyd and various specialists of open data, data journalism, Internet data.
Interestingly, the video was shot at the École Normale Supérieure Campus Jourdan in Paris, a place that hosts part of Réseau Quetelet, the French national data service for the social sciences and humanities. The Jourdan unit ADISP handles primarily statistical data – data from surveys conducted by INSEE, the French national statistical agency, and administrative data such as those of the ministries of education and labour – for use in the social sciences.
In fact, research in social sciences has always used data as a basic ingredient. Data from official and public-sector statistics have long set the standard, and access to these data is ever more in demand today. European initiatives like the Data without Boundaries project in which I am taking part, aim precisely to bring improvements in this area.
We are now in the midst of a major upgrade today with the availability of big data, data from the Internet, the digital traces of our activities. They have the advantage that they can be retrieved, saved, coded and processed much faster, much more easily and in much larger amounts than more classical records such as registers of students in schools or of patients in hospitals. McKinsey has already pointed to potential economic benefits of big data for business, and research has taken notice too.
But big data do not automatically imply better quality of research. The sheer amount of data or even quality (completeness and richness for example) are not enough and may even raise problems. In an influential paper, danah boyd and Kate Crawford warn about potential risks, from privacy protection concerns to the narrowing down of the focus of social research and the misunderstandings that the mostly a-theoretical stance that comes with big data may bring about. I think the most interesting uses of big data are those that have a good empirical strategy: to really enhance social science research, you need an intelligent data collection design and organization of data, ideally to produce quasi-experimental conditions.
For example, I have in mind a couple of prominent studies conducted in the United States. Damon Centola conducted an experiment on a forum dedicated to health problems, looking at how changes in the structure of links on this forum, more or less apt to promote contact between participants, could favour the spread of healthier behaviors. And he could draw firm conclusions on this basis. Sinan Aral and Dylan Walker adopted a similarly strong experimental approach on a very large scale on Facebook to study peer influence – an object that has always fascinated social scientists, but that it is a challenge to identify. The possibility of extracting big data offers a plus here: it allows us to observe behaviours under controlled conditions on a large scale and in an everyday situation, unlike traditional psycho-social experiments in the lab, where people do not necessarily behave as, so to speak, in real life.
What matters most are theory and approach: data comes next. I worked on projects with Antonio Casilli in particular, where we chose to use “small data” rather than big data. The reason was that we were in situations where it was not possible to collect big data, especially when we wanted to react very quickly to British riots of August 2011 with a research paper.
When the facts took place, there was no data available online yet to do a study. This came afterwards, when Twitter made available its data to some colleagues, notably Farida Vis and her team who analysed it. But not in August. So how could we react to the events just-in-time? We did so through computer simulation and used the Internet in a different way, that is to say, to collect feedback, opinions, comments from users, activists, people who had seen the facts and could give us suggestions on how to improve our simulation. In this way, we could respond very quickly and participate in a discussion that required immediate answers, because London was burning and we felt the urge to contribute, to do our bit, as researchers.
So after all, the conclusion is not that it is research with Internet and big data that is to be conducted from now on. The era of big data reinforces a tradition of data use that had already begun long ago, and was already strong, without giving up the methods, tools and analytical solutions that have been used previously. So I see an evolution of two things in parallel – traditional and modern, small and big – with convergence and synergies that I hope, will soon materialize.
Filed under: Data, Internet and social media, Research, Social science methodology | Leave a Comment
Tags: Big data, Quantitative methods, Small data, Social science data, Social simulation, Statistical data, Web-based social networks
I am an economic sociologist with interest in social networks and their impact on markets, organisations, consumer choice and health.
My research also includes work in social science methodology and data.
- Just posted "The power of dataviz (150 years ago)" wp.me/p3UPOT-25 #statistics #data #dataviz 8 hours ago
- RT @Soc_Imagination: Upcoming event: Symposium on Web Surveys of the General Population (5 June 2014) sociologicalimagination.org/?p=15075 3 days ago
- Google Announces An Online Data Interpretation Class For The General Public techcrunch.com/2014/02/26/goo… via linkedin.com/in/pascal #data 4 days ago
- RT @GreenwichNickyG: Top 150 Nonprofit Blogs topnonprofits.com/lists/nonprofi… via @vankorlaar 6 days ago
- Just posted, "Sharing medical data for research: Why we should all care" wp.me/p3UPOT-1Y #data #healthcare 6 days ago
Creative Commons Licence
Paola Tubaro's Blog by Paola Tubaro is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Based on a work at paolatubaro.wordpress.com.