Making sense of data: Insight from statistics
I’m just back from the World Statistics Congress, a grand event that took place in Dublin in the last few days, bringing together statisticians from all over the world and from all sorts of institutions -from government offices and international institutions to academia and private companies.
The event prompts me to think more about my latest post with Antonio A. Casilli, where we argued that sheer data do not speak by themselves (specifically, about the ethnic origins of participants in the recent UK riots): data need to be carefully selected to extract the relevant information, and then need to be interpreted. Without theories, data just don’t make any sense.
Attending such a large statistics conference did not change my mind -quite the contrary. In a comment to the Financial Times (also referring to the conference!) of last Wednesday, the economist John Kay warned young scholars about the interpretation and use of data. Don’t just believe figures, he said -what does it mean that “on average, men think about sex every seven seconds”? Ask where these responses come from, who collected/elaborated the data, and what were the original questions that such numbers were meant to answer. Computer programmes help us manipulate larger amounts of data much faster than in the past -but do not remove the need for interpretation. New findings remain just as unlikely as they used to -he added, “When I discover something surprising in data, the most common explanation is that I made a mistake”.
At the conference, I picked up a copy of the journal “Significance” (March 2011 issue) featuring an interview with Hal Varian, chief economist at Google. He is quoted to have said “The sexy profession of the next decade will be statistician”. Said by someone who works for one of the largest data handlers of the world, this sentence first sounded worrying -but no. Varian explains that the massive amounts of data available today can be as useless as they are cheap. Tools and techniques are needed to extract the right information from the data -otherwise they can be of little use. And in some cases, more information can be obtained from a smaller sample, cleverly selected and analysed, than a larger dataset. What he meant by qualifying statisticians as “sexy” (apart from pleasing the journal’s readers, one could argue!) is in fact best translated by his opposition between statisticians and computer scientists: the latter “use vast datasets and unstructured models”, while the former “have complex models and smaller datasets”. He was in fact, calling for more theory and thinking, now that the data are abundant: again, data by themselves are not enough.
In this sense, I would add that other professions may also become sexy -not just statistics but also the social sciences, to the extent that they also provide theories, tools and methods that make sense of all these data. Enthusiasms for what some call “data deluge” has led many to forget this other aspect -yet slowly, perhaps too slowly, some are starting to realize we cannot do without theories, after all.
Filed under: Research, Social science methodology | Leave a Comment
Tags: 2011 UK riots, economic methodology, Quantitative methods, Social science data, social theory, Statistical modeling
I am an economic sociologist with interest in social networks and their impact on markets, organisations, consumer choice and health.
My research also includes work in social science methodology and data.
- E Penalva on business and discussion networks in public-private partnerships at #insna #sunbelt13: dense discussion net around few contracts 2 hours ago
- RT @MiriamNotten: Great ppt Lukas Zenk engineering meeting people at conferences Practical & useable #insna #sunbelt13 #packed http://t.co/… 3 hours ago
- RT @bodyspacesoc: Êtes-vous déjà inscrit au séminaire de @ptubaro “#Proana: réseaux sociaux & troubles alimentaires”? http://t.co/XZGZlMNs0… 8 hours ago
- #insna Diani's study of Greek protests: core of the agenda was impact of crisis on democracy, not crisis itself - so shows network analysis 1 day ago
- RT @gagliol: #insna #sunbelt13 Mario Diani, "Campaigns are networks of events connected by claims" 1 day ago
Creative Commons Licence
Paola Tubaro's Blog by Paola Tubaro is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Based on a work at paolatubaro.wordpress.com.