A new era for social science data?
On Tuesday and Wednesday last week, I was at the 1st European Data Access Forum on Eurostat premises in Luxembourg. Representatives of governmental statistical agencies and academia (including funding councils, data services and us researchers) discussed how to enhance research use of the wealth of data that governments produce and that offer a great, still largely underexploited, resource for social science. These data include census and large-scale surveys of persons and businesses (for example, the Labour Force Survey), as well as administrative records (such as birth and death registers), which are both richer and less widely exploited than surveys.
Get-together events such as this one are rare, though not unique -I was at a similar one, also in Luxembourg, a few years ago. As at that time, the atmosphere was a bit confrontational with researchers loudly complaining that statistical institutes are too conservative in their interpretation of data protection rules and should trust them further; and statistical institutes retorting that the law does not allow them to do more.
Yet there are reasons for optimism: a study I am conducting for a large European project, and whose first results I presented in Luxembourg, shows that access conditions for researchers are significantly improving all over Europe.
No country categorically refuses to give access to its data. Even in Eastern Europe, where my project partners and I expected to find little trust in government institutions and a strong bias in favour of privacy protection against the interests of research, there are provisions for access and procedures to make it possible in practice. (A small workshop we previously organised in Bucharest provided evidence for this: presentations can be accessed here).
We initially thought that differences across countries were insurmountable, and we expected many of them to
entirely exclude non-national researchers. Well, this is not true: most differences concern details and concrete arrangements, but underlying principles are rather similar across countries; and access to researchers of other European countries is always allowed though in practice, there are often restrictions on amount or type of accessible data, or more burdensome application processes.
Differences in how to deal with opaque cases are minor too. For example, are students researchers? Most countries answer that PhD students are, while there is more variation for masters’ and bachelors’ (but these students are also less likely to have very sophisticated data needs).
All this is good news, even though other problems remain. One of them is a frustrating lack of online information on available data and how to access them (which is the main reason why we initially suspected there could be countries offering no access at all). This can be fixed easily, by just improving online communication. Statistical offices of countries should take responsibility for that, but it should be the job of researchers to work at the other, and more challenging task, of raising awareness among those among their colleagues who do not use these data at the moment (the majority of social scientists, I would say).
The other most pressing challenge is that government data are not the only data that can be useful for social science, and that in today’s era of ubiquitous computer-mediated communication and “big data“, top-flying researchers rather expect to extract more insight from data from the web -particularly from social networking and other online services such as Facebook, Twitter, Google or e-Bay, to name just a few. Yet all this is a no man’s land where the limitations set by data protection laws notoriously become fuzzy, and there are myriad ways to get around them. User’s attitudes are just as unclear as those of providers (I am myself currently studying online privacy perceptions in another project). Researchers’ access to these data is uneven and may further deepen existing inequalities within academia.
To be sure, there have been some attempts at developing global data policies that take these data into account; but I am a bit doubtful that they can be effective as Internet data production moves way much faster than policy-makers (especially at European or trans-national level). Yet the experience with public sector data providers may still provide suggestions on how to move forwards. After all, despite all the criticisms, statistical agencies do manage to allow research use of their data, while still fully meeting clear, well-defined and often strict data protection requirements. After all, they do so to a much greater extent than can be thought at first sight -as our recent study suggests. And let me also add that they often provide access for free (or at relatively low fees).
Perhaps, Internet companies may have something to learn from them; and we, the research community, should facilitate the coming together of these two communities to exchange on these topics, ensuring that our data needs can be met at best. Perhaps, that’s the way to go -for the social sciences to enter a new era of enhanced access to plenty of new, rich, and up-to-date data resources.
Filed under: Data, Research, Social science methodology | Leave a Comment
Tags: Big data, Data access, Internet data, Research policy, Social science data, Statistical data, Web-based social networks

No Responses Yet to “A new era for social science data?”