Old world meets new world: open data and research
The IAOS conference in Kiev, now just over, has been a great opportunity to discuss the “open data” movement and how it is radically transforming policy-making. Data, it is hoped, enable citizens to make more informed choices and hold the government to account: expenses, contracts, decisions and even meetings with lobbyists are now subject to public scrutiny.
Open data also create new opportunities for the agencies mandated by governments to produce information, or National Statistical Institutes (NSIs): to expand coverage (say, to other sectors of the economy that are now missing from official series) or to produce statistical information on a more timely basis.
In truth, some NSIs are nervous because open data may break their monopoly on information production. New actors are now able to mine large amounts of information from public records and the Web, sometimes even combining them with commercial sources, and build datasets whosesize is beyond the capacity of typical NSIs data management tools (sometimes called “big data“).
NSIs are right to be worried to the extent that more competition does not necessarily mean better data products: multiplicity of sources, timeliness, and the sheer volume of data sets obtained through open data are likely to result in less reliability or more bias, in contrast to NSIs’ traditional commitment to quality. How, then, to raise awareness about these issues among policymakers and the general public? How to avoid misinterpretation and misuse of data?
Another concern is that some data are hardly amenable to open publication, because they contain detailed information about individuals or businesses, and these individuals/businesses may be re-identified by data users. This would violate the confidentiality pledge that NSIs are legally bound to offer. Imagine your anger if your health records, or your tax returns, were made available for all to see! This largely explains NSIs’ caution in embracing the open data movement.
My point at the conference, has been to argue that this last challenge also contains the key to finding a solution. There is already some form of access for NSIs’ detailed data, albeit a restricted one, and that’s for scientific and statistical research purposes. Because detailed data allow better analysis and policy advice, most countries have legal provisions to make them available under conditions that minimize the risk of re-identification.
Scientific access is just as important as open access: while the latter is driving fast-paced change, there is no less pressure on NSIs to enhance, modernize and expand their arrangements for researchers! While this may seem yet another challenge for NSIs, the experience that at least some of them have already gained in this area is a valuable resource that can create new opportunities.
In fact collaborative environments between researchers and NSIs, where they exist (France, Norway, and UK to name but a few), may help to reap benefits from open data. NSIs could leverage the strong methodological and analytical expertise of researchers to develop tools to assess data quality in the case of those large, unconventional, less structured datasets that open data enables to produce; and because researchers are often also educators, they are also a source of advice on how to acquaint policy-makers and the public with data quality issues.
Collaboration with researchers may also open the way to developing analytical tools to handle today’s new data structures. It is sometimes argued that in the past, social theory was used to compensate for lack of empirical evidence; now that this evidence is not only available, but (overly) abundant, social theory does not lose relevance but takes on the new role of providing tools to interpret data. It becomes an essential resource to make sense of large amounts of data that would be intractable with traditional statistical methods.
For sure, NSIs would have gain from this process; and in return, could also offer some of their expertise in a way that would benefit other data actors and society at large. The existing collaborations some of them have with the social science research community have led to the development of a wealth of experience and expertise in confidentiality and personal data protection under usually strict conditions, even when highly detailed or sensitive data are released. This know-how may perhaps be transferred to the wide range of new data actors, whose extensive data mining from the Internet, social media, fidelity programmes and credit card records is legally a grey area, as the many allegations against some of these companies indicate. There is a strong societal need for guidelines; and NSIs more than others, can contribute to defining them.
In conclusion, research access is an old-fashioned demand for NSIs relative to today’s open data, but it should not be taken out of the debate; in fact, that was my main message to delegates at the conference, it should be part of the overall strategy towards open data.
For more information on current initiatives to enhance research access, see our project Data without Boundaries.
Filed under: Data, Research, Social science methodology | Leave a Comment
Tags: Data policy, Open data, Research data, Social science data, social theory, Statistical modeling, Statistics
I am an economic sociologist with interest in social networks and their impact on markets, organisations, consumer choice and health.
My research also includes work in social science methodology and data.
- @lutzid 8:30 after a night at the hospitlity suite is a challenge, but I'll be there! interested in academic #networks #insna #sunbelt13 7 hours ago
- RT @marc_smith: insna NodeXL SNA Map bit.ly/19cPQAE @bkeegan @gagliol @tombrughmans @ptubaro @barrywellman @alansloane @twytof @Ren… 7 hours ago
- @bkeegan @marc_smith thaaannnkkss for the graph! great! #insna #sunbelt13 bit.ly/18cdi2b 13 hours ago
- #sunbelt13 #insna haven't seen any tweets about it yet - the news is that UK (Brighton) has won bid to host #sunbelt15 19 hours ago
- am I wrong, or there's been relatively little tweeting during #insna #sunbelt13? anyone has any useful infographics? 22 hours ago
Creative Commons Licence
Paola Tubaro's Blog by Paola Tubaro is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Based on a work at paolatubaro.wordpress.com.