Data Big and Small

Is Paris a 15-minute city?

A few years ago, when hopes to leverage technology to build a more humane “sharing” economy had not yet completely vanished, it was often believed that the most interesting policy experiments were to be found at the local level of cities, not states. One of those was the urban-planning concept of a 15-minute city, aiming to make any essential amenities such as schools and shops accessible within a 15-min walk or bike ride. Launched in Paris before receiving enthusiastic support worldwide, it was part of the current mayor’s latest re-election campaign.

Fast forward to today, and how actually far is Paris from the 15-min goal? Sarah J. Berkemer and I have endeavoured to answer this question in a just-published article (available in open access!) with three brilliant ENSAE students (Marie-Olive Thaury, Simon Genet and Léopold Maurice). We harness open map data from the large participatory project Open Street Map and geo-localized socio-economic data from official statistics (Insee) to fill this gap.

While the city of Paris is rather homogeneous, we show that it is nonetheless characterized by remarkable inequalities between a highly accessible city centre (though with some internal differences in terms of types of amenities) and a less equipped periphery, where lower-income neighborhoods are more often found. Heterogeneity increases if we consider Paris together with its immediate surroundings, the “Petite Couronne,” where large numbers of daily commuters and other users of city facilities live.

We find that this ambitious urban planning objective cannot be achieved without addressing existing socio-economic inequalities, and that especially in a big city like Paris, it cannot be confined within the narrow boundaries of the municipality itself, without also including the city’s immediate surroundings.

One reason why I am particularly proud of this work is that it demonstrates how far research-informed teaching can go. Most higher education is about familiarizing students with generally accepted and confirmed knowledge, without going beyond the state of the art. This is certainly important and in all cases a “safe” bet, but does not give students a sense of what it means to push the boundaries further. This project was an opportunity to do so. It gave students the role of researchers – our peers – letting them play the role in full and showing them all the back-office work that lies behind publications (from drafting to responding to reviews and copy-editing), and that is (too) often occluded from students’ view. There’s probably more to experiment around this model.

To read the full article: Thaury M.-O., Genet S., Maurice L., Tubaro P. & Berkemer S.J., 2024, ‘City composition and accessibility statistics in and around Paris’, Frontiers in Big Data, 7, DOI=10.3389/fdata.2024.1354007

Meet the human workers behind AI

Last week with the Diplab team, we spent two exciting days at the European Parliament in Brussels, engaging in profound discussions with and about platform workers as part of the 4th edition of the Transnational Forum on Alternatives to Uberization.

Our stellar panel, co-organized with A. Casilli, M. Miceli, T. Le Bonniec and others, featured data workers and commercial content moderators Kauna Ibrahim Malgwi, Noraly Guevara and Sakine B., as well as researcher Jonas CL Valente from the Fairwork initiative.

Together, we delved into the intricacies of the human labor that fuels artificial intelligence and ensures safe participation to social media. Together, we discussed workers’ expectations, concerns and common struggles to move forward toward a world in which where technology serves all humans equally and responsibly.

1st INDL-Middle East and Africa conference

I am proud to announce that our group International Network on Digital Labor (INDL), together with the Access to Knowledge for Development Center (A2K4D) at The American University in Cairo’s School of Business, is organising the inaugural conference of the Middle East and Africa (MEA) chapter of INDL titled ‘Digital Labor Perspectives from the Middle East and Africa.’ Organized in collaboration with the International Labour Organization (ILO), Digital Platform Labor (DiPLab), Weizenbaum Institute and Université française d’Egypte, this conference will be held on May 28, 2024, in Cairo, Egypt.

Rationale

Digital labor is at the heart of our evolving economies. To address the specific challenges and developments in the Middle East and Africa (MEA), we are launching a dedicated chapter of INDL for the region.

This conference provides a unique platform to present research related to the MEA region, both ongoing and/or burgeoning. The conference offers opportunities for scholars and practitioners to engage with topics such as platformization, automation, gig economy dynamics, and technology-mediated labor.

INDL-MEA will feature three tracks: one in Arabic, one in English, and one in French, reflecting the linguistic diversity of the region.

Topics

Submissions must be in reference to the MEA region, for instance: in perspective, case studies, or focus.

Submission topics may include but are not limited to

Case studies examining platforms, gig economy workers, and online digital labor in MEA
Exploring algorithmic management practices in work processes, recruiting, and HR in MEA
Issues of digital platform labor on gender and inclusion in the MEA region
Consequences of the shift to digital labor on workers, businesses, economies, and labor markets in MEA
Effects of remote work and digital labor on employee well-being and productivity in MEA
Policy responses to the rise of digital labor and automation in MEA, including regulatory measures and government intervention
Strategies for organizing digital workers and managing geographically distributed workforces in MEA
Intersectional perspectives on digital labor in MEA
Exploring AI and digital labor through a decolonial lens in MEA
Challenges posed by Generative AI to human labor in MEA

Submissions

We invite submissions of anonymized abstracts for papers, case studies, and policy briefs related to these topics. Abstracts, up to 500 words, can be submitted in Arabic, English, or French through our website INDL-MEA.

Important Dates

Deadline for submissions: January 31, 2024
Acceptance notification: February 15, 2024
Registration opens: TBA
INDL-MEA conference date: May 28, 2024

Together, let’s foster a thought-provoking dialogue and contribute to shaping the future of digital labor in the Middle East and Africa.

For more information, please see the INDL website.

To submit an abstract, click here.

The socio-contextual basis for disinformation

Within the Horizon-Europe project AI4TRUST, we published a first report presenting the state of the art in the socio-contextual basis for disinformation, relying on a broad review of extant literature, of which the below is a synthesis.

What is disinformation?

Recent literature distinguishes three forms:

‘misinformation’ (inaccurate information unwittingly produced or reproduced)

‘disinformation’ (erroneous, fabricated, or misleading information that is intentionally shared and may cause individual or social harm)

‘malinformation’ (accurate information deliberately misused with malicious or harmful intent).

Two consequences derive from this insight. First, the expression ‘fake news’ is unhelpful: problematic contents are not just news, and are not always false. Second, research efforts limited to identifying incorrect information alone, without capturing intent, may miss some of the key social processes surrounding the emergence and spread of problematic contents.

How does mis/dis/malinformation spread?

Recent literature often describes the characteristics of the process of diffusion of mis/dis/malinformation in terms of ‘cascades’, that is, the iterative propagation of content from one actor to others in a tree-like fashion, sometimes with consideration of temporality and geographical reach. There is evidence that network structures may facilitate or hinder propagation, regardless of the characteristics of individuals: therefore, relationships and interactions constitute an essential object of study to understand how problematic contents spread. Instead, the actual offline impact of online disinformation (for example, the extent to which online campaigns may have inflected electoral outcomes) is disputed. Likewise, evidence on the capacity of mis/dis/malinformation to spread across countries is mixed. A promising perspective to move forwards relies on hybrid approaches mixing network and content analysis (‘socio-semantic networks’).

What incentivizes mis/dis/malinformation?

Mis/dis/malinformation campaigns are not always driven solely by political tensions and may also be the product of economic interest. There may be incentives to produce or share problematic information, insofar as the business model of the internet confers value upon contents that attract attention, regardless of their veracity or quality. A growing, shadow market of paid ‘like’, ‘share’ and ‘follow’ inflates the rankings and reputation scores of web pages and social media profiles, and it may ultimately mislead search engines. Thus, online metrics derived from users’ ratings should be interpreted with caution. Research should also be mindful that high-profile disinformation campaigns are only the tip of the iceberg, low-stake cases being far more frequent and difficult to detect.

Who spreads mis/dis/malinformation?

Spreaders of mis/dis/malinformation may be bots or human users, the former being increasingly controlled by social media companies. Not all humans are equally likely to play this role, though, and the literature highlights ‘super-spreaders’, particularly successful at sharing popular albeit implausible contents, and clusters of spreaders – both detectable in data with social network analysis techniques.

How is mis/dis/malinformation adopted?

Adoption of mis/dis/malinformation should not be taken for granted and depends on cognitive and psychological factors at individual and group levels, as well as on network structures. Actors use ‘appropriateness judgments’ to give meaning to information and elaborate it interactively with their networks. Judgments depend on people’s identification to reference groups, recognition of authorities, and alignment with priority norms. Adoption can thus be hypothesised to increase when judgments are similar and signalled as such in communication networks. Future research could target such signals to help users in their contextualization and interpretation of the phenomena described.

Multiple examples of research in social network analysis can help develop a model of the emergence and development of appropriateness judgements. Homophily and social influence theories help conceptualise the role of inter-individual similarities, the dynamics of diffusion in networks sheds light on temporal patterns, and analyses of heterogeneous networks illuminate our understanding of interactions. Overall, social network analysis combined with content analysis can help research identify indicators of coordinated malicious behaviour, either structural or dynamic.

Micro-work and the outsourcing industry in Madagascar

I had the privilege and pleasure to visit Madagascar in the last two weeks. I had an invitation from Institut Français where I participated in a very interesting panel on “How can Madagascar help us rethink artificial intelligence more ethically?”, with Antonio A. Casilli, Jeremy Ranjatoelina et Manovosoa Rakotovao. I also conducted exploratory fieldwork by visiting a sample of technology companies, as well as journalists and associations interested in the topic.

A former French colony, Madagascar participates in the global trend toward outsourcing / offshoring which has shaped the world economy in the past two decades. The country harnesses its cultural and linguistic heritage (about one quarter of the population still speak French, often as a second language) to develop services for clients mostly based in France. In particular, it is a net exporter of computing services – still a small-sized sector, but with growing economic value.

Last year, a team of colleagues has already conducted extensive research with Madagascan companies that provide micro-work and data annotation services for French producers of artificial intelligence (and of other digital services). Some interesting results of their research are available here. This time, we are trying to take a broader look at the sector and include a wider variety of computing services, also trying to trace higher-value-added activities (like computer programming, website design, and even AI development).

It is too early to present any results, but the big question so far is the sustainability of this model and the extent to which it can push Madagascar higher up in the global technology value chain. Annotation and other lower-level services create much-needed jobs in a sluggish economy with widespread poverty and a lot of informality; however, these jobs attract low recognition and comparatively low pay, and have failed so far to offer bridges toward more stable or rewarding career paths. More qualified computing jobs are better paid and protected, but turnover is high and (national and international) competition is tough.

At policy level, more attention should be brought to the quality of these jobs and their longer-term stability, while client tech companies in France and other Global North countries should take more responsibility over working conditions throughout their international supply chains.

6th Conference of the International Network on Digital Labor (INDL-6)

I’m sooo glad to be in Berlin for the 6th edition of this beloved INDL-6 conference, which is taking place at Weizenbaum Institut!

INDL started as a small-scale, informal, little-funded project, aiming to create linkages between academics and students interested in the transformations of labour brought about by digital technologies. We first met in Paris in Spring 2017, then in Louvain-la-Neuve (Belgium) a few months later, and in both cases, a smallish 20-people room was enough for all. Back then, we called ourselves ENDL (where E stood for “European”).

But in 2019, we partnered with Toronto-based colleagues and upgraded to INDL, moving from European to International level. We started a cycle of conferences which initially remained rather small-scaled, and for two years had to take place online owing to the pandemic crisis. Things started to change in 2022, when colleagues from Greece proposed to restart an in-person version of the conference which eventually took place in Athens. It was also the first time that we launched a call for papers, rather than just limiting ourselves to invited speakers, and the conference was a huge success, with almost a hundred participants and sessions running in parallel.

This year edition’s follows the same format, and I’m so happy to see that a large community is forming around this topic. It’s good to see some people who already attended last year or even before, together with many new faces, and numbers continuing to grow (this year, we have three instead of just two parallel sessions!).

Together with the parallel sessions, this year’s event includes three keynotes, an arts-meets-science session, and a regulation-oriented debate on due diligence processes and the technology supply chain. Weizenbaum Institut is a wonderful place and has made available funding, support, and an incredibly committed team of colleagues, students, and volunteers who are making this conference a success.

For the programme, link to the livestreaming of plenaries and main sessions, and further information, please see indl.network.

Where does AI come from, and where is it heading?

Most of my current research aims to unpack artificial intelligence (AI) from the viewpoint of its commercial production, looking in particular at the human resources needed to prepare the data it needs – whence my studies on the data work and annotation market. However, for once, I am focusing on AI as a set of scientific theories and tools, regardless of their market positioning; indeed, I have joined a team of science-of-science specialists to study the disciplinary origins and subsequent spread of AI over time.

In a newly published, open-acces article, we unveil the disciplinary composition of AI, and the links between its various sub-fields. We question a common distinction between ‘native’ and ‘applicative’ disciplines, whereby only the former (typically confined to statistics, mathematics, and computer science) produce foundational algorithms and theorems for AI. In fact, we find that the origins of the field are rather multi-disciplinary and benefit, among others, from insights from cognitive science, psychology, and philosophy. These intersecting contributions were most evident in the historical practices commonly known as ‘symbolic systems’. Later, different scientific fields have become, in turn, the central originating domains and applicators of AI knowledge, for example operations research, which was for a long time one of the core actors of AI applications related to expert systems.

While the notion of statistics, mathematics and computer science as native disciplines has become more relevant in recent times, the spread of AI throughout the scientific ecosystem is uneven. In particular, only a small number of AI tools, such as dimensionality reduction techniques, are widely adopted (for example, variants of these techniques have been in use in sociology for decades). But if transfer of AI is largely ascribable to multi-disciplinary interactions, very few of them exist. We observe very limited collaborations between researchers in disciplines that create AI and researchers in disciplines that only (or mainly) apply AI. The small core of multi-disciplinary champions who interact with both sides, and the presence of a few multi-disciplinary journals, sustains the whole system.

Inter- and multi-disciplinary interactions are essential for AI to thrive and to adequately support scientific research in all fields, but disciplinary boundaries are notoriously hard to break. Strategies to better reward inter-disciplinary training, publications, and careers, are thus essential. Of course the potential for AI to significantly advance knowledge is still (largely) to be proven, and there have been disappointing experiences with, for example, the comparatively limited effectiveness of these tools in research on Covid-19. In all cases, the status quo is not ideal, and important steps forward are now needed.

We establish these results by analyzing a large corpus of scientific papers published between 1970 and 2017, extracted from Microsoft Academic Graph through the AI keywords used by the authors, and explored with different relational structures among the scientometric data (keyword co-occurrence network, authors’ collaboration network).

Full citation: Floriana Gargiulo, Sylvain Fontaine, Michel Dubois, Paola Tubaro. A meso-scale cartography of the AI ecosystem. Quantitative Science Studies, 2023; doi: https://doi.org/10.1162/qss_a_00267

Voices from Online Labour: Inequalities in digital earning activities across countries

What shapes differences in how people get paid, are deemed productive, or receive respect? Alongside traditional explanations of social inequalities such as class, gender, age, disability, race, migration status, rural vs. urban residence, and others, a recent literature highlights the effects of digital divides. The digitally resourced have more opportunities across all life spheres, from consumption to education, work, and health. Ironically, though, digital technologies also generate new vulnerabilities by generalizing low-paid and contingent work. Digital labour platforms like Uber, Deliveroo and Upwork use data and algorithms to match clients with workers, construed as independent contractors, for one-off ‘gigs’ without any long-term commitment. These workers are largely exposed to the vagaries of the market and have limited or no social protection, although increasing efforts aim to bring labour law to bear on platforms.

Growing concerns that platform workers compare unfavourably to conventional employees have already attracted significant research and policy attention. But more remains to be done to fully understand how the recent rise of labour platforms has undermined the relationship between digitization and inequalities, adding a layer of complexity. Scattered, but growing evidence indeed suggests that platforms may be accelerating transmission to digital worlds of ’legacy’ inequalities for example vis-à-vis race and gender, while also fostering the proliferation of ’emerging’ inequalities that diminish users’ agency and augment the power of technology creators and big-tech multinationals. Especially platforms for remote online-only labour change the geographical scale at which these questions arise, projecting workers toward a competitive planetary market that relentlessly selects winners and losers.

To tackle these questions, I’m happy and honoured to announce that I have just been awarded a major grant (almost 570k euros, at marginal cost) by the French National Agency for Research (ANR) for a new 4-year study called VOLI: Voices from Online Labour. As a team effort that builds on a solid record of interdisciplinary collaborations, VOLI innovatively combines hypotheses and methods from sociology and neighbouring disciplines, notably large-scale corpus linguistics (I’ll explain why below), and relies on speech technology and artificial intelligence to tackle the rising economic risks that coalesce around the nexus between online platform labour, digitization, and social inequalities. The project leverages the power and potential of the very digital tools whose societal effects it studies, to develop an original and potentially transferable methodology.

The innovative idea that underpins the project is to tackle the problem through language, benefiting from recent advances in linguistics research and its capacity to recast methods and tools from artificial intelligence in a broad sense – including speech and language technology and machine learning techniques – to capture features and processes that used to escape its traditional methods. Despite the importance of linguistic tasks (such as translation, transcription, writing, and editing) in online labour platforms, linguistic methods have never been applied to the study of these workers before, and thus are best positioned to bring fresh insight. To this end, we have assembled a team composed of speech technology scientists, computational linguists specialized in multilingual and large-scale corpora analysis, and computational, digital, and labour sociologists. Expected results sustain our ambition to devise policy solutions to mitigate the effects of inequalities, and to support the individuals and groups that accumulate multiple sources of disadvantage.

To harness our previous research experience and ensure continuity, we focus on so-called ’micro-work’, the necessary but inconspicuous contribution of low-paid masses who annotate, tag, label, correct and sort data to fuel the digital economy, especially artificial intelligence. Because it is performed remotely and can be allocated to providers worldwide, micro-work differs from location-based platform ’gigs’ such as delivery and transport. It also differs from online-only jobs for freelancers, for example in computer programming and design, insofar as its extreme segmentation and standardization allow dispersing tasks to an undefined crowd instead of a selected individual (whence the alternative denomination of ’crowdwork’). Micro-tasks include, for example, recording one’s voice while reading aloud a sentence, labelling files, translating short bits of text, classifying contents in an image or webpage. They perform essential functions in the development of machine learning and artificial intelligence, from data generation and enrichment to quality controls of automated outputs. We give voice to these workers, often invisibilized by the automation narratives popular in the technology industry, in that we interview them about their lived experience, aspirations, motivations and perhaps regrets; and we rely on their voices as data for the simultaneous development of sociology, linguistics, and artificial intelligence (specifically, speech recognition) itself.

Indeed while bringing to the next level our sociological knowledge of the linkages between micro-work and digital inequalities, the methods that will be developed within this highly interdisciplinary project advance the study of the factors driving speech variation within the discipline of linguistics, augmenting language corpora with rich sets of metadata from sociological surveys, while also building and testing new and improved tools for automated transcription, with potential commercial applications.

I am the PI of the VOLI project which involves four research centres within France:

CREST (S. Coavoux, E. Ollion, P. Präg)
LISN (I. Vasilescu, L. Lamel, M. Evrard)
CRISCO (Y. Wu)
SES Department of Telecom Paris (A.A. Casilli, J. Torres Cierpe),

plus a company, Vocapia Research (V.B. Le, J. Despres, I. Swiecicki), and three international partners:

Weizenbaum Institut Berlin (M. Miceli),
Universitat Autònoma de Barcelona (J.L. Molina)
Universitat de València (J.A. Santos Ortega).

Brazil in the global AI supply chains: the role of micro-workers

AI is not just a Silicon Valley dream. It relies among other things, on inputs from human workers who generate and annotate data for machine learning. They record their voice to augment speech datasets, transcribe receipts to provide examples to OCR software, tag objects in photographs to train computer vision algorithms, and so on. They also check algorithmic outputs, for example, by noting whether the outputs of a search engine meet users’ queries. Occasionally, they take the place of failing automation, for example when content moderation software is not subtle enough to distinguish whether some image or video is appropriate. AI producers outsource these so-called “micro-tasks” via international digital labor platforms, who often recruit workers in Global-South countries, where labor costs are lower. Pay is by piecework, without any no long-term commitment and without any social-security scheme or labor protection.

In a just-published report co-authored with Matheus Viana Braz and Antonio A. Casilli, as part of the research program DiPlab, we lifted the curtain on micro-workers in Brazil, a country with a huge, growing, and yet largely unexplored reservoir of AI workers.

We found among other things that:

Three out of five Brazilian data workers are women, while in most other previously-surveyed countries, women are a minority (one in three or less in ILO data).
9 reais (1.73 euros) per hour is the average amount earned on platforms.
There are at least 54 micro-working platforms operating in Brazil.
One third of Brazilian micro-workers have no other source of income, and depend on microworking platforms for subsistence.
Two out of five Brazilian data workers are (apart from this activity) unemployed, without professional activity, or in informality. In Brazil, platform microwork arises out of widespread unemployment and informalization of work.
Three out of five of data workers have completed undergraduate education, although they mostly do repetitive and unchallenging online data tasks, suggesting some form of skill mismatch.
The worst microtasks involve moderation of violent and pornographic contents on social media, as well as data training in tasks that workers may find uncomfortable or weird, such as taking pictures of dog poop in domestic environments to train data for “vacuuming robots”.
Workers’ main grievances are linked to uncertainty, lack of transparency, job insecurity, fatigue and lack of social interaction on platforms.

To read the report in English, click here.

To read the report in Portuguese, click here.